[petsc-users] NVIDIA HPC SDK and complex data type

Jonathan D. Halverson halverson at Princeton.EDU
Mon Dec 20 08:19:47 CST 2021

Hi Junchao,

Thank you very much for your quick work. The simple build procedure now works.

Since it will take a while for NVIDIA to fix the bug in their NVCHPC 21.11 (December 2021), I added a workaround to the MR in petsc, https://gitlab.com/petsc/petsc/-/merge_requests/4663
I tested it and it works with debugging (-O0) or no debugging (-O, or -O2).

--Junchao Zhang

  Yes, Junchao deserves a bounty from NVIDIA for this find.

I found it is a NVIDIA C/C++ compiler bug.  I can reproduce it with

Great find!


#include <stdlib.h>
#include <stdio.h>
#include <complex.h>

typedef double _Complex PetscScalar;
typedef struct {
int row;
PetscScalar *valaddr;
} MatEntry2;

int main(int arc, char** argv)
int i=2;
MatEntry2 *Jentry2 = (MatEntry2*)malloc(64*sizeof(MatEntry2));
PetscScalar a=1, b=1;

Jentry2[2].valaddr = (PetscScalar*)malloc(16*sizeof(PetscScalar));
*(Jentry2[i].valaddr) = a*b; // Segfault

return 0;

$ nvc -O0 -o test test.c
$ ./test
Segmentation fault (core dumped)

If I change *(Jentry2[i].valaddr) = a*b; to

PetscScalar *p = Jentry2[2].valaddr;
*p = a*b;

Then the code works fine.  Using -O0 to -O2 will also avoid this error for this simple test, but not for PETSc.  In PETSc, I could apply the above silly trick, but I am not sure it is worth it. We should instead report it to NVIDIA.

Looking at the assembly code for the segfault line,  we can find the problem
  movslq  52(%rsp), %rcx
  movq  40(%rsp), %rax
  movq  8(%rax,%rcx,8), %rax   //  Here %rax = &Jentry2, %rcx = i;  The instruction wrongly calculates Jentry2[2].valaddr as  (%rax + %rcx*8)+8,  which should instead be (%rax + %rcx*16)+8
  vmovsd  %xmm1, 8(%rax)
  vmovsd  %xmm0, (%rax)

--Junchao Zhang

Hi, Jon,
  I could reproduce the error exactly.  I will have a look.
  Thanks for reporting.
--Junchao Zhang

We are unable to build PETSc using the NVIDIA HPC SDK and --with-scalar-type=complex. Below is our procedure:

$ module load nvhpc/21.11
$ module load openmpi/nvhpc-21.11/4.1.2/64
$ git clone -b release https://gitlab.com/petsc/petsc.git petsc; cd petsc
$ ./configure --with-debugging=1 --with-scalar-type=complex PETSC_ARCH=openmpi-power
$ make PETSC_DIR=/home/$USER/software/petsc PETSC_ARCH=openmpi-power all
$ make PETSC_DIR=/home/$USER/software/petsc PETSC_ARCH=openmpi-power check

"make check" fails with a segmentation fault when running ex19. The fortran test ex5f passes.

The procedure above fails on x86_64 and POWER both running RHEL8. It also fails using nvhpc 20.7.

The procedure above works for "real" instead of "complex".

A "hello world" MPI code using a complex data type works with our nvhpc modules.

The procedure above works successfully when GCC and an Open MPI library built using GCC is used.

The only trouble is the combination of PETSc with nvhpc and complex. Any known issues?

The build log for the procedure above is here:


