[petsc-users] Combining MPIUNI with duplicated comms

Barry Smith bsmith at mcs.anl.gov
Mon Oct 5 15:46:56 CDT 2015


  Arnem,

   Please find attached a patch that allows you to create additional communicators and use them and free them with MPI-Uni.  This is also in the development branch barry/fix-mpiuni; once it is tested completely we will put it into the maint branch and it will be in the next patch release.

   Please let us know if the patch does not resolve your difficulties.

  Barry

Note that is an arbitrary limit of 120 communicators allowed.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: fix-mpiuni.patch
Type: application/octet-stream
Size: 3638 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20151005/86fcd13c/attachment.obj>
-------------- next part --------------

> On Oct 5, 2015, at 7:13 AM, Arne Morten Kvarving <arne.morten.kvarving at sintef.no> wrote:
> 
> hi there.
> 
> first of all, i'm not 100% this is supposed to work. if it is not, feel free to tell me just that and i'll find a way to workaround.
> 
> in our code we use MPI_Comm_dup for various reasons. this works fine. except when i run in serial, i.e., using the MPUNI wrapper. if the PETSC_COMM_SELF communicator is duplicated, nastyness results when tearing down the KSP object. while this isn't really fatal as such, it is rather annoying.
> 
> please consider http://www.math.ntnu.no/~arnemort/bug-self.tar.bz2 which is a silly code i wrote to reproduce the issue. a 3x3 matrix is constructed, a corresponding 1-vector and the system is solved.
> then the objects are teared down.
> 
> the tarball has a cmake buildsystem. the buildsystem relies on the standard PETSC_DIR / PETSC_ARCH env variables. if you prefer to build by hand, there is no requirement on using the build system. just watch out for the define it adds.
> 
> the application takes one required command line parameter which should be 'self' or 'world'. this controls which comm is duplicated. if I run with self, i get
> 
> [0]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
> [0]PETSC ERROR: Corrupt argument: http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind
> [0]PETSC ERROR: Inner MPI_Comm does not have expected reference to outer comm
> [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting.
> [0]PETSC ERROR: Petsc Release Version 3.6.2, Oct, 02, 2015
> [0]PETSC ERROR: ./bug_comm_self on a linux-gnu-cxx-dbg named akvarium by akva Mon Oct  5 14:05:52 2015
> [0]PETSC ERROR: Configure options --with-precision=double --with-scalar-type=real --with-debugging=1 --with-blas-lib=/usr/lib/libblas.a --with-lapack-lib=/usr/lib/liblapack.a --with-64-bit-indices=0 --with-clanguage=c++ --with-mpi=0 --LIBS=-ldl --with-ml=0 --with-shared-libraries=0
> [0]PETSC ERROR: #1 Petsc_DelComm_Outer() line 360 in /home/akva/kode/petsc/petsc-3.6.2/src/sys/objects/pinit.c
> 
> if i run with world, no error is generated.
> 
> this is with the recetly released 3.6.2, but i can reproduce atleast back to 3.5.2 (haven't tried older).
> 
> i also noticed that if i add a VecView call, errors will also be generated with world. this line is commented out in the source code. might be some valuable info for those who know the system in that fact.
> 
> since i made it reproducable through the test app, i skipped logs for now. if they or some other info is required from me side, i'll gladly provide it.
> 
> thanks in advance
> 
> arnem



More information about the petsc-users mailing list