[petsc-users] NVIDIA HPC SDK and complex data type
Jonathan D. Halverson
halverson at Princeton.EDU
Mon Dec 20 08:19:47 CST 2021
Hi Junchao,
Thank you very much for your quick work. The simple build procedure now works.
Jon
________________________________
From: Junchao Zhang <junchao.zhang at gmail.com>
Sent: Sunday, December 19, 2021 6:38 PM
To: petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>
Cc: Jonathan D. Halverson <halverson at Princeton.EDU>
Subject: Re: [petsc-users] NVIDIA HPC SDK and complex data type
Since it will take a while for NVIDIA to fix the bug in their NVCHPC 21.11 (December 2021), I added a workaround to the MR in petsc, https://gitlab.com/petsc/petsc/-/merge_requests/4663
I tested it and it works with debugging (-O0) or no debugging (-O, or -O2).
--Junchao Zhang
On Sat, Dec 18, 2021 at 7:51 PM Barry Smith <bsmith at petsc.dev<mailto:bsmith at petsc.dev>> wrote:
Yes, Junchao deserves a bounty from NVIDIA for this find.
On Dec 18, 2021, at 8:22 PM, Matthew Knepley <knepley at gmail.com<mailto:knepley at gmail.com>> wrote:
On Sat, Dec 18, 2021 at 7:03 PM Junchao Zhang <junchao.zhang at gmail.com<mailto:junchao.zhang at gmail.com>> wrote:
I found it is a NVIDIA C/C++ compiler bug. I can reproduce it with
Great find!
Matt
#include <stdlib.h>
#include <stdio.h>
#include <complex.h>
typedef double _Complex PetscScalar;
typedef struct {
int row;
PetscScalar *valaddr;
} MatEntry2;
int main(int arc, char** argv)
{
int i=2;
MatEntry2 *Jentry2 = (MatEntry2*)malloc(64*sizeof(MatEntry2));
PetscScalar a=1, b=1;
printf("sizeof(MatEntry2)=%lu\n",sizeof(MatEntry2));
Jentry2[2].valaddr = (PetscScalar*)malloc(16*sizeof(PetscScalar));
*(Jentry2[i].valaddr) = a*b; // Segfault
free(Jentry2[2].valaddr);
free(Jentry2);
return 0;
}
$ nvc -O0 -o test test.c
$ ./test
sizeof(MatEntry2)=16
Segmentation fault (core dumped)
If I change *(Jentry2[i].valaddr) = a*b; to
PetscScalar *p = Jentry2[2].valaddr;
*p = a*b;
Then the code works fine. Using -O0 to -O2 will also avoid this error for this simple test, but not for PETSc. In PETSc, I could apply the above silly trick, but I am not sure it is worth it. We should instead report it to NVIDIA.
Looking at the assembly code for the segfault line, we can find the problem
movslq 52(%rsp), %rcx
movq 40(%rsp), %rax
movq 8(%rax,%rcx,8), %rax // Here %rax = &Jentry2, %rcx = i; The instruction wrongly calculates Jentry2[2].valaddr as (%rax + %rcx*8)+8, which should instead be (%rax + %rcx*16)+8
vmovsd %xmm1, 8(%rax)
vmovsd %xmm0, (%rax)
--Junchao Zhang
On Fri, Dec 17, 2021 at 7:58 PM Junchao Zhang <junchao.zhang at gmail.com<mailto:junchao.zhang at gmail.com>> wrote:
Hi, Jon,
I could reproduce the error exactly. I will have a look.
Thanks for reporting.
--Junchao Zhang
On Fri, Dec 17, 2021 at 2:56 PM Jonathan D. Halverson <halverson at princeton.edu<mailto:halverson at princeton.edu>> wrote:
Hello,
We are unable to build PETSc using the NVIDIA HPC SDK and --with-scalar-type=complex. Below is our procedure:
$ module load nvhpc/21.11
$ module load openmpi/nvhpc-21.11/4.1.2/64
$ git clone -b release https://gitlab.com/petsc/petsc.git petsc; cd petsc
$ ./configure --with-debugging=1 --with-scalar-type=complex PETSC_ARCH=openmpi-power
$ make PETSC_DIR=/home/$USER/software/petsc PETSC_ARCH=openmpi-power all
$ make PETSC_DIR=/home/$USER/software/petsc PETSC_ARCH=openmpi-power check
"make check" fails with a segmentation fault when running ex19. The fortran test ex5f passes.
The procedure above fails on x86_64 and POWER both running RHEL8. It also fails using nvhpc 20.7.
The procedure above works for "real" instead of "complex".
A "hello world" MPI code using a complex data type works with our nvhpc modules.
The procedure above works successfully when GCC and an Open MPI library built using GCC is used.
The only trouble is the combination of PETSc with nvhpc and complex. Any known issues?
The build log for the procedure above is here:
https://tigress-web.princeton.edu/~jdh4/petsc_nvhpc_complex_17dec2021.log
Jon
--
What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
-- Norbert Wiener
https://www.cse.buffalo.edu/~knepley/<http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20211220/bccc4edc/attachment.html>
More information about the petsc-users
mailing list