[petsc-users] Petsc + nvhpc
howen
herbert.owen at bsc.es
Fri Nov 14 09:08:24 CST 2025
Thank you very much Matthew,
I did what you suggested and I also added
ierr = MatView(*amat, PETSC_VIEWER_STDOUT_WORLD); CHKERRQ(ierr);
Now that I can see the matrices I notice that some values differ. I will debug and simplify my code to try to understand where the difference comes from .
As soon as I have a more clear picture I will contact you back.
Best,
Herbert Owen
Senior Researcher, Dpt. Computer Applications in Science and Engineering
Barcelona Supercomputing Center (BSC-CNS)
Tel: +34 93 413 4038
Skype: herbert.owen
https://urldefense.us/v3/__https://scholar.google.es/citations?user=qe5O2IYAAAAJ&hl=en__;!!G_uCfscf7eWS!dLyvJQfp1uwWG0xt9W0Mel0ZD1L7iEt2qnUs6XfEM-gvuPIwdzmzUE8dkvjDoYsah4-z0d0W6hCI9jZ_17GnGxsT7iF2$
> On 13 Nov 2025, at 18:23, Matthew Knepley <knepley at gmail.com> wrote:
>
> On Thu, Nov 13, 2025 at 12:11 PM howen via petsc-users <petsc-users at mcs.anl.gov <mailto:petsc-users at mcs.anl.gov>> wrote:
>> Dear Junchao,
>>
>> Thank you for response and sorry for taking so long to answer back.
>> I cannot avoid using the nvidia tools. Gfortran is not mature for OpenACC and gives us problems when compiling our code.
>> What I have done to enable using the latest petsc is to create my own C code to call petsc.
>> I have little experience with c and it took me some time, but I can now use petsc 3.24.1 ;)
>>
>> The behaviour remains the same as in my original email .
>> Parallel+GPU gives bad results. CPU(serial and parallel) and GPU serial all work ok and give the same result.
>>
>> I have gone a bit into petsc comparing the CPU and GPU version with 2 mpi.
>> I see that the difference starts in
>> src/ksp/ksp/impls/cg/cg.c L170
>> PetscCall(KSP_PCApply(ksp, R, Z)); /* z <- Br */
>> I have printed the vectors R and Z and the norm dp.
>> R is identical on both CPU and GPU; but Z differs.
>> The correct value of dp (for the first time it enters) is 14.3014, while running on the GPU with 2 mpis it gives 14.7493.
>> If you wish I can send you prints I introduced in cg.c
>
> Thank you for all the detail in this report. However, since you see a problem in KSPCG, I believe we can reduce the complexity. You can use
>
> -ksp_view_mat binary:A.bin -ksp_view_rhs binary:b.bin
>
> and send us those files. Then we can run your system directly using KSP ex10 (and so can you).
>
> Thanks,
>
> Matt
>
>> The folder with the input files to run the case can be downloaded from https://urldefense.us/v3/__https://b2drop.eudat.eu/s/wKRQ4LK7RTKz2iQ__;!!G_uCfscf7eWS!dLyvJQfp1uwWG0xt9W0Mel0ZD1L7iEt2qnUs6XfEM-gvuPIwdzmzUE8dkvjDoYsah4-z0d0W6hCI9jZ_17GnG1yKXAMP$ <https://urldefense.us/v3/__https://b2drop.eudat.eu/s/wKRQ4LK7RTKz2iQ__;!!G_uCfscf7eWS!Y_YIXIdrN81gDKgNed6V4icL3nN9OG62-ZsdnB1Bkc7iiGAoJ2riwbTzxJMnIROon3mXgiFVLnbH0RTlsXrLEAh7n_UO$>
>>
>> For submitting the gpu run I use
>> mpirun -np 2 --map-by ppr:4:node:PE=20 --report-bindings ./mn5_bind.sh /gpfs/scratch/bsc21/bsc021257/git/140-add-petsc/sod2d_gitlab/build_gpu/src/app_sod2d/sod2d ChannelFlowSolverIncomp.json
>>
>> For the cpu run
>> mpirun -np 2 /gpfs/scratch/bsc21/bsc021257/git/140-add-petsc/sod2d_gitlab/build_cpu/src/app_sod2d/sod2d ChannelFlowSolverIncomp.json
>>
>> Our code can be downloaded with :
>> git clone --recursive https://urldefense.us/v3/__https://gitlab.com/bsc_sod2d/sod2d_gitlab.git__;!!G_uCfscf7eWS!dLyvJQfp1uwWG0xt9W0Mel0ZD1L7iEt2qnUs6XfEM-gvuPIwdzmzUE8dkvjDoYsah4-z0d0W6hCI9jZ_17GnG8xQvHi_$ <https://urldefense.us/v3/__https://gitlab.com/bsc_sod2d/sod2d_gitlab.git__;!!G_uCfscf7eWS!Y_YIXIdrN81gDKgNed6V4icL3nN9OG62-ZsdnB1Bkc7iiGAoJ2riwbTzxJMnIROon3mXgiFVLnbH0RTlsXrLEFjsBTIo$>
>>
>> -and the branch I am using with
>> git checkout 140-add-petsc
>>
>> To use exactly the same commit I am using
>> git checkout 09a923c9b57e46b14ae54b935845d50272691ace
>>
>>
>> I am currently using: Currently Loaded Modules:
>> 1) nvidia-hpc-sdk/25.1 2) hdf5/1.14.1-2-nvidia-nvhpcx 3) cmake/3.25.1
>> I guess/hope similar modules should be available in any supercomputer.
>>
>> To build the cpu version
>> mkdir build_cpu
>> cd build_cpu
>>
>> export PETSC_INSTALL=/gpfs/scratch/bsc21/bsc021257/git/petsc_oct25/3241_cpu/hhinstal
>> export LD_LIBRARY_PATH=$PETSC_INSTALL/lib:$LD_LIBRARY_PATH
>> export LIBRARY_PATH=$PETSC_INSTALL/lib:$LIBRARY_PATH
>> export C_INCLUDE_PATH=$PETSC_INSTALL/include:$C_INCLUDE_PATH
>> export CPLUS_INCLUDE_PATH=$PETSC_INSTALL/include:$CPLUS_INCLUDE_PATH
>> export PKG_CONFIG_PATH=$PETSC_INSTALL/lib/pkgconfig:$PKG_CONFIG_PATH
>>
>> cmake -DUSE_RP=8 -DUSE_PORDER=3 -DUSE_PETSC=ON -DUSE_GPU=OFF -DDEBUG_MODE=OFF ..
>> make -j 80
>>
>> I have built petsc myself as follows
>>
>> git clone -b release https://urldefense.us/v3/__https://gitlab.com/petsc/petsc.git__;!!G_uCfscf7eWS!dLyvJQfp1uwWG0xt9W0Mel0ZD1L7iEt2qnUs6XfEM-gvuPIwdzmzUE8dkvjDoYsah4-z0d0W6hCI9jZ_17GnG9OyCmiL$ <https://urldefense.us/v3/__https://gitlab.com/petsc/petsc.git__;!!G_uCfscf7eWS!Y_YIXIdrN81gDKgNed6V4icL3nN9OG62-ZsdnB1Bkc7iiGAoJ2riwbTzxJMnIROon3mXgiFVLnbH0RTlsXrLELP8U6d0$> petsc
>> cd petsc
>> git checkout v3.24.1
>> module purge
>> module load nvidia-hpc-sdk/25.1 hdf5/1.14.1-2-nvidia-nvhpcx cmake/3.25.1
>> ./configure --PETSC_DIR=/gpfs/scratch/bsc21/bsc021257/git/petsc_oct25/3241/petsc --prefix=/gpfs/scratch/bsc21/bsc021257/git/petsc_oct25/3241/hhinstal --with-fortran-bindings=0 --with-fc=0 --with-petsc-arch=linux-x86_64-opt --with-scalar-type=real --with-debugging=yes --with-64-bit-indices=1 --with-precision=single --download-hypre CFLAGS=-I/apps/ACC/HDF5/1.14.1-2/NVIDIA/NVHPCX/include CXXFLAGS= FCFLAGS= --with-shared-libraries=1 --with-mpi=1 --with-blacs-lib=/gpfs/apps/MN5/ACC/ONEAPI/2025.1/mkl/2025.1/lib/intel64/libmkl_blacs_openmpi_lp64.a --with-blacs-include=/gpfs/apps/MN5/ACC/ONEAPI/2025.1/mkl/2025.1/include --with-mpi-dir=/apps/ACC/NVIDIA-HPC-SDK/25.1/Linux_x86_64/25.1/comm_libs/12.6/hpcx/latest/ompi/ --download-ptscotch=yes --download-metis --download-parmetis
>> make all check
>> make install
>>
>> -------------------
>> For the GPU version when configuring petsc I add : --with-cuda
>>
>> I then change the export PETSC_INSTALL to
>> export PETSC_INSTALL=/gpfs/scratch/bsc21/bsc021257/git/petsc_oct25/3241/hhinstal
>> and repeat all other exports
>>
>> mkdir build_gpu
>> cd build_gpu
>> cmake -DUSE_RP=8 -DUSE_PORDER=3 -DUSE_PETSC=ON -DUSE_GPU=ON -DDEBUG_MODE=OFF ..
>> make -j 80
>>
>> As you can see from the submit instructions the executable is found in sod2d_gitlab/build_gpu/src/app_sod2d/sod2d
>>
>> I hope I have not forgotten anything and my instructions are 'easy' to follow. If you have any issue do not doubt to contact me.
>> The wiki for our code can be found in https://urldefense.us/v3/__https://gitlab.com/bsc_sod2d/sod2d_gitlab/-/wikis/home__;!!G_uCfscf7eWS!dLyvJQfp1uwWG0xt9W0Mel0ZD1L7iEt2qnUs6XfEM-gvuPIwdzmzUE8dkvjDoYsah4-z0d0W6hCI9jZ_17GnG49E2dbs$ <https://urldefense.us/v3/__https://gitlab.com/bsc_sod2d/sod2d_gitlab/-/wikis/home__;!!G_uCfscf7eWS!Y_YIXIdrN81gDKgNed6V4icL3nN9OG62-ZsdnB1Bkc7iiGAoJ2riwbTzxJMnIROon3mXgiFVLnbH0RTlsXrLEA1vqPYk$>
>>
>> Best,
>>
>> Herbert Owen
>>
>> Herbert Owen
>> Senior Researcher, Dpt. Computer Applications in Science and Engineering
>> Barcelona Supercomputing Center (BSC-CNS)
>> Tel: +34 93 413 4038
>> Skype: herbert.owen
>>
>> https://urldefense.us/v3/__https://scholar.google.es/citations?user=qe5O2IYAAAAJ&hl=en__;!!G_uCfscf7eWS!dLyvJQfp1uwWG0xt9W0Mel0ZD1L7iEt2qnUs6XfEM-gvuPIwdzmzUE8dkvjDoYsah4-z0d0W6hCI9jZ_17GnGxsT7iF2$ <https://urldefense.us/v3/__https://scholar.google.es/citations?user=qe5O2IYAAAAJ&hl=en__;!!G_uCfscf7eWS!Y_YIXIdrN81gDKgNed6V4icL3nN9OG62-ZsdnB1Bkc7iiGAoJ2riwbTzxJMnIROon3mXgiFVLnbH0RTlsXrLEAA5PwtO$>
>>
>>
>>
>>
>>
>>
>>
>>
>>> On 16 Oct 2025, at 18:30, Junchao Zhang <junchao.zhang at gmail.com <mailto:junchao.zhang at gmail.com>> wrote:
>>>
>>> Hi, Herbert,
>>> I don't have much experience on OpenACC and PETSc CI doesn't have such tests. Could you avoid using nvfortran and instead use gfortran to compile your Fortran + OpenACC code? If you, then you can use the latest petsc code and make our debugging easier.
>>> Also, could you provide us with a test and instructions to reproduce the problem?
>>>
>>> Thanks!
>>> --Junchao Zhang
>>>
>>>
>>> On Thu, Oct 16, 2025 at 5:07 AM howen via petsc-users <petsc-users at mcs.anl.gov <mailto:petsc-users at mcs.anl.gov>> wrote:
>>>> Dear All,
>>>>
>>>> I am interfacing our CFD code (Fortran + OpenACC) to Petsc.
>>>> Since we use OpenACC the natural choice for us is to use Nvidia´s nvhpc compiler. The Gnu compiler does not work well and we do not have access to the Cray compiler.
>>>>
>>>> I already know that the latest version of Petsc does not compile with nvhpc, I am therefore using version 3.21.
>>>> I get good results on the CPU both in serial and parallel (MPI). However, the GPU implementation, that is what we are interested in, only work correctly for the serial version. In parallel, the results are different. Even for a CG solve.
>>>>
>>>> I would like to know, if you have experience with the Nvidia compiler. I am particularly interested if you have already observed issues with it. Your opinion on whether to put further effort into trying to find a bug I may have introduced during the interfacing is highly appreciated.
>>>>
>>>> Best,
>>>>
>>>> Herbert Owen
>>>> Senior Researcher, Dpt. Computer Applications in Science and Engineering
>>>> Barcelona Supercomputing Center (BSC-CNS)
>>>> Tel: +34 93 413 4038
>>>> Skype: herbert.owen
>>>>
>>>> https://urldefense.us/v3/__https://scholar.google.es/citations?user=qe5O2IYAAAAJ&hl=en__;!!G_uCfscf7eWS!dLyvJQfp1uwWG0xt9W0Mel0ZD1L7iEt2qnUs6XfEM-gvuPIwdzmzUE8dkvjDoYsah4-z0d0W6hCI9jZ_17GnGxsT7iF2$ <https://urldefense.us/v3/__https://scholar.google.es/citations?user=qe5O2IYAAAAJ&hl=en__;!!G_uCfscf7eWS!abuM7ozzUs7eISYBumHNxpvO2Tuy74KRM4-WWcunXHZVjQf1V032xQrCzTfC5vA_NM-35xMEZ9yJ8XK-3QFqjWBSWuUi$>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>
>
>
>
> --
> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
> -- Norbert Wiener
>
> https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!dLyvJQfp1uwWG0xt9W0Mel0ZD1L7iEt2qnUs6XfEM-gvuPIwdzmzUE8dkvjDoYsah4-z0d0W6hCI9jZ_17GnG02oLPV3$ <https://urldefense.us/v3/__http://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!dLyvJQfp1uwWG0xt9W0Mel0ZD1L7iEt2qnUs6XfEM-gvuPIwdzmzUE8dkvjDoYsah4-z0d0W6hCI9jZ_17GnG3J_36vG$ >
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20251114/6112d2d8/attachment.html>
More information about the petsc-users
mailing list