[petsc-users] KSP in OpenMP Parallel For Loop
Barry Smith
bsmith at mcs.anl.gov
Tue Apr 15 17:27:45 CDT 2014
David,
After wrestling with several problems on this I realized that this branch only works if one is “lucky” in what they do. PETSc was originally written without regard to threads at all and in this branch I tried to “fix” all the cases where we accessed global variables so each thread could create and destroy its own objects. I did this in a way independent of the threading model you used (OpenMP or pthreads) by simply removing use of global variables, not by introducing model specific locks in PETSc code. Unfortunately you’ve stumbled across a situation that does not involve global variables but instead variables stored inside MPI attributes that different threads are changing (and since they are not protected by locks) end up with incorrect values which later causes crashes. Essentially the variables in the MPI attributes are like global variables because all the threads have access to them. To fix this would require putting a bunch of locks in PETSc code that deals with the MPI attributes.
I’m sorry but this won’t happen soon. The only good news is that within a month we have a new guy revisiting PETSc’s own threading code (which focuses on several threads per object) and maybe between him and Jed they can expand that approach to handle the needs of "different objects for different threads” that you have.
I wish we had such support currently but it greatly increases the complexity of the PETSc code so has to be done well or we’ll be totally screwed.
Barry
On Apr 15, 2014, at 12:06 AM, D H <mrhyde at stanford.edu> wrote:
> Dear Barry,
>
> Hope you are having a good week so far!
>
> I am still working on trying to get OpenMP parallel for loops to play nicely with PETSc's KSP functions.
>
> I modified the KSP ex1 that comes with PETSc to repeat its work 1000 times inside an omp parallel for loop, and sure enough, I was able to get the same errors as I get with my own program. I've attached the source code for my modified ex1 if you have any time to take a look and see if this works on your machine. Once in a while, this modified example will run all the way through without errors, but 9 times out of 10 it crashes somewhere through the 1000 iterations, at least on my machine. In order to avoid compiler errors I was getting from openmp, I overrode the definitions of CHKERRQ and SETERRQ near the top of the file... this of course could have goofed things up, but I was just trying to make something quick and dirty that reproduced the errors I was getting from petsc.
>
> If it makes a difference (perhaps there is something amiss here), my configuration for this petsc branch is
> PETSC_ARCH=linux-gnu PETSC_DIR=/home/dabh/petsc --with-clanguage=cxx --download-hypre=1 --download-f-blas-lapack=1 --download-mpich=1 --with-debugging=0 COPTFLAGS="-O2 -march=native" CXXOPTFLAGS="-O2 -march=native" FOPTFLAGS="-O2 -march=native" --with-log=0
>
> and the full error message I get from petsc is
>
> [0]PETSC ERROR: --------------------- Error Message ------------------------------------
> [0]PETSC ERROR: Corrupt argument:
> see http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind!
> [0]PETSC ERROR: MPI_Comm does not have tag/name counter nor does it have inner MPI_Comm!
> [0]PETSC ERROR: ------------------------------------------------------------------------
> [0]PETSC ERROR: Petsc Development GIT revision: 67670ee0cfa93f0c400c9bf0001548cd51aae596 GIT Date: 2013-11-16 17:43:36 -0600
> [0]PETSC ERROR: See docs/changes/index.html for recent updates.
> [0]PETSC ERROR: See docs/faq.html for hints about trouble shooting.
> [0]PETSC ERROR: See docs/index.html for manual pages.
> [0]PETSC ERROR: ------------------------------------------------------------------------
> [0]PETSC ERROR: ./petsc_ex12 on a linux-gnu named dabh by dabh Mon Apr 14 21:49:32 2014
> [0]PETSC ERROR: Libraries linked from /home/dabh/petsc/linux-gnu/lib
> [0]PETSC ERROR: Configure run at Tue Apr 8 17:53:42 2014
> [0]PETSC ERROR: Configure options PETSC_ARCH=linux-gnu PETSC_DIR=/home/dabh/petsc --with-clanguage=cxx --download-hypre=1 --download-f-blas-lapack=1 --download-mpich=1 --with-debugging=0 COPTFLAGS="-O2 -march=native" CXXOPTFLAGS="-O2 -march=native" FOPTFLAGS="-O2 -march=native" --with-log=0
> [0]PETSC ERROR: ------------------------------------------------------------------------
> [0]PETSC ERROR: PetscCommDestroy() line 234 in src/sys/objects/tagm.c
> [0]PETSC ERROR: PetscHeaderDestroy_Private() line 118 in src/sys/objects/inherit.c
> [0]PETSC ERROR: PetscViewerDestroy() line 107 in src/sys/classes/viewer/interface/view.c
> [0]PETSC ERROR: PetscObjectDestroy() line 73 in src/sys/objects/destroy.c
> [0]PETSC ERROR: PetscObjectRegisterDestroyAll() line 252 in src/sys/objects/destroy.c
> [0]PETSC ERROR: PetscFinalize() line 1146 in src/sys/objects/pinit.c
>
> This is happening on a machine running Fedora 20 (64-bit), compiling things with g++.
>
> If you have any further insights into how to get petsc working properly with openmp, like with this example program, I would be most grateful for your time and suggestions. Thank you again for your help!
>
> Best,
>
> David
>
> ----- Original Message -----
> From: "Barry Smith" <bsmith at mcs.anl.gov>
> To: "D H" <mrhyde at stanford.edu>
> Sent: Tuesday, April 8, 2014 1:49:27 PM
> Subject: Re: [petsc-users] KSP in OpenMP Parallel For Loop
>
>
> On Apr 8, 2014, at 3:39 PM, D H <mrhyde at stanford.edu> wrote:
>
>> Dear Barry,
>>
>> Thanks for your email! I found the barry/make-petscoptionsobject-nonglobal branch and tried playing with it - it sounds like exactly what I need. I wasn't able to get it to compile at first - in src/sys/dll/reg.c, I had to add the following includes at the top:
>> #include <petscthreadcomm.h>
>> #include <petscao.h>
>> #include <petscpc.h>
>> #include <petscts.h>
>> #include <petscdm.h>
>> #include <petsccharacteristic.h>
>> #include <petscsf.h>
>>
>> in order to get all the "...InitializePackage()" calls resolved. PetscThreadCommWorldInitialize() still wasn't being resolved, and I didn't see it in a header file anywhere (just in threadcomm.c), so I added this after line 44 in petscthreadcomm.h: "PETSC_EXTERN PetscErrorCode PetscThreadCommWorldInitialize(void);". After these changes I could get the branch compiling fine.
>
> Not sure about this business.
>>
>> Unfortunately, I'm still getting segfaults when I run my code. Things seem to break at one of three lines in my code:
>> "ierr = PetscOptionsSetValue("-pc_hypre_boomeramg_relax_type_coarse", "SOR/Jacobi"); CHKERRXX(ierr);"
>> "ierr = PCSetType(pc, pc_type); CHKERRXX(ierr);"
>> "ierr = KSPSolve(ksp, b, x); CHKERRXX(ierr);"
>> (it's usually the first line that causes the problem, though)
>
> You need to configure PETSc in this branch with --with-debugging=0 --with-log=0 I forgot to tell you that. The extra debugging and logging use global variables and hence must be turned off.
>
> Barry
>
>>
>> PETSc reports that some memory is being double-freed, although it isn't helpful in pointing out exactly what object is having this memory issue.
>>
>> If it makes a difference, I'm attempting to use BCGS with the HYPRE BoomerAMG preconditioner.
>>
>> Do you have any ideas as to might what be going wrong? Is there anywhere else in PETSc where there might be relevant global variables that are causing the crash? It is possible that the branch is fine and that there's a problem in my solver class, but my solver class doesn't have any global variables, so I don't think there is a problem there (especially since it works fine in serial).
>>
>> I really appreciate your time and any additional thoughts you might have. Thanks very much!
>>
>> Best,
>>
>> David
>>
>> ----- Original Message -----
>> From: "Barry Smith" <bsmith at mcs.anl.gov>
>> To: "D H" <mrhyde at stanford.edu>
>> Cc: petsc-users at mcs.anl.gov
>> Sent: Saturday, April 5, 2014 11:34:37 AM
>> Subject: Re: [petsc-users] KSP in OpenMP Parallel For Loop
>>
>>
>> There is a branch in the petsc bitbucket repository http://www.mcs.anl.gov/petsc/developers/index.html called barry/make-petscoptionsobject-nonglobal where one can call the PETSc operations in threads without any conflicts. Otherwise it just won’t work.
>>
>> Barry
>>
>> On Apr 5, 2014, at 12:25 PM, D H <mrhyde at stanford.edu> wrote:
>>
>>> Hi,
>>>
>>> I have a C++ program where I would like to call some of PETSc's KSP methods (KSPCreate, KSPSolve, etc.) from inside a for loop that has a "#pragma omp parallel for" in front of it. Without this OpenMP pragma, my code runs fine. But when I add in this parallelism, my program segfaults with PETSc reporting some memory corruption errors.
>>>
>>> I've read online in a few places that PETSc is not thread-safe, but before I give up hope, I thought I would ask to see if anyone has had success working with KSP routines when they are being called simultaneously from multiple threads (or whether such a feat is definitely not possible with PETSc). Thanks very much for your advice!
>>>
>>> Best,
>>>
>>> David
>>
>
> <main.cpp>
More information about the petsc-users
mailing list