[petsc-users] OpenMPI 2.0 and Petsc 3.7.2

Eric Chamberland Eric.Chamberland at giref.ulaval.ca
Thu Sep 1 08:04:01 CDT 2016


Just to "close" this thread, the offending bug has been found and 
corrected (was with MPI I/O implementation) (see 
https://github.com/open-mpi/ompi/issues/1875).

So with forthcoming OpenMPI 2.0.1 everyhting is fine with PETSc for me.

have a nice day!

Eric


On 25/07/16 03:53 PM, Matthew Knepley wrote:
> On Mon, Jul 25, 2016 at 12:44 PM, Eric Chamberland
> <Eric.Chamberland at giref.ulaval.ca
> <mailto:Eric.Chamberland at giref.ulaval.ca>> wrote:
>
>     Ok,
>
>     here is the 2 points answered:
>
>     #1) got valgrind output... here is the fatal free operation:
>
>
> Okay, this is not the MatMult scatter, this is for local representations
> of ghosted vectors. However, to me
> it looks like OpenMPI mistakenly frees its built-in type for MPI_DOUBLE.
>
>
>     ==107156== Invalid free() / delete / delete[] / realloc()
>     ==107156==    at 0x4C2A37C: free (in
>     /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
>     ==107156==    by 0x1E63CD5F: opal_free (malloc.c:184)
>     ==107156==    by 0x27622627: mca_pml_ob1_recv_request_fini
>     (pml_ob1_recvreq.h:133)
>     ==107156==    by 0x27622C4F: mca_pml_ob1_recv_request_free
>     (pml_ob1_recvreq.c:90)
>     ==107156==    by 0x1D3EF9DC: ompi_request_free (request.h:362)
>     ==107156==    by 0x1D3EFAD5: PMPI_Request_free (prequest_free.c:59)
>     ==107156==    by 0x14AE3B9C: VecScatterDestroy_PtoP (vpscat.c:219)
>     ==107156==    by 0x14ADEB74: VecScatterDestroy (vscat.c:1860)
>     ==107156==    by 0x14A8D426: VecDestroy_MPI (pdvec.c:25)
>     ==107156==    by 0x14A33809: VecDestroy (vector.c:432)
>     ==107156==    by 0x10A2A5AB: GIREFVecDestroy(_p_Vec*&)
>     (girefConfigurationPETSc.h:115)
>     ==107156==    by 0x10BA9F14: VecteurPETSc::detruitObjetPETSc()
>     (VecteurPETSc.cc:2292)
>     ==107156==    by 0x10BA9D0D: VecteurPETSc::~VecteurPETSc()
>     (VecteurPETSc.cc:287)
>     ==107156==    by 0x10BA9F48: VecteurPETSc::~VecteurPETSc()
>     (VecteurPETSc.cc:281)
>     ==107156==    by 0x1135A57B:
>     PPReactionsAppuiEL3D::~PPReactionsAppuiEL3D()
>     (PPReactionsAppuiEL3D.cc:216)
>     ==107156==    by 0xCD9A1EA: ProblemeGD::~ProblemeGD() (in
>     /home/mefpp_ericc/depots_prepush/GIREF/lib/libgiref_dev_Formulation.so)
>     ==107156==    by 0x435702: main (Test.ProblemeGD.icc:381)
>     ==107156==  Address 0x1d6acbc0 is 0 bytes inside data symbol
>     "ompi_mpi_double"
>     --107156-- REDIR: 0x1dda2680 (libc.so.6:__GI_stpcpy) redirected to
>     0x4c2f330 (__GI_stpcpy)
>     ==107156==
>     ==107156== Process terminating with default action of signal 6
>     (SIGABRT): dumping core
>     ==107156==    at 0x1DD520C7: raise (in /lib64/libc-2.19.so
>     <http://libc-2.19.so>)
>     ==107156==    by 0x1DD53534: abort (in /lib64/libc-2.19.so
>     <http://libc-2.19.so>)
>     ==107156==    by 0x1DD4B145: __assert_fail_base (in
>     /lib64/libc-2.19.so <http://libc-2.19.so>)
>     ==107156==    by 0x1DD4B1F1: __assert_fail (in /lib64/libc-2.19.so
>     <http://libc-2.19.so>)
>     ==107156==    by 0x27626D12: mca_pml_ob1_send_request_fini
>     (pml_ob1_sendreq.h:221)
>     ==107156==    by 0x276274C9: mca_pml_ob1_send_request_free
>     (pml_ob1_sendreq.c:117)
>     ==107156==    by 0x1D3EF9DC: ompi_request_free (request.h:362)
>     ==107156==    by 0x1D3EFAD5: PMPI_Request_free (prequest_free.c:59)
>     ==107156==    by 0x14AE3C3C: VecScatterDestroy_PtoP (vpscat.c:225)
>     ==107156==    by 0x14ADEB74: VecScatterDestroy (vscat.c:1860)
>     ==107156==    by 0x14A8D426: VecDestroy_MPI (pdvec.c:25)
>     ==107156==    by 0x14A33809: VecDestroy (vector.c:432)
>     ==107156==    by 0x10A2A5AB: GIREFVecDestroy(_p_Vec*&)
>     (girefConfigurationPETSc.h:115)
>     ==107156==    by 0x10BA9F14: VecteurPETSc::detruitObjetPETSc()
>     (VecteurPETSc.cc:2292)
>     ==107156==    by 0x10BA9D0D: VecteurPETSc::~VecteurPETSc()
>     (VecteurPETSc.cc:287)
>     ==107156==    by 0x10BA9F48: VecteurPETSc::~VecteurPETSc()
>     (VecteurPETSc.cc:281)
>     ==107156==    by 0x1135A57B:
>     PPReactionsAppuiEL3D::~PPReactionsAppuiEL3D()
>     (PPReactionsAppuiEL3D.cc:216)
>     ==107156==    by 0xCD9A1EA: ProblemeGD::~ProblemeGD() (in
>     /home/mefpp_ericc/depots_prepush/GIREF/lib/libgiref_dev_Formulation.so)
>     ==107156==    by 0x435702: main (Test.ProblemeGD.icc:381)
>
>
>     #2) For the run with -vecscatter_alltoall it works...!
>
>     As an "end user", should I ever modify these VecScatterCreate
>     options? How do they change the performances of the code on large
>     problems?
>
>
> Yep, those options are there because the different variants are better
> on different architectures, and you can't know which one to pick until
> runtime,
> (and without experimentation).
>
>   Thanks,
>
>     Matt
>
>
>     Thanks,
>
>     Eric
>
>     On 25/07/16 02:57 PM, Matthew Knepley wrote:
>
>         On Mon, Jul 25, 2016 at 11:33 AM, Eric Chamberland
>         <Eric.Chamberland at giref.ulaval.ca
>         <mailto:Eric.Chamberland at giref.ulaval.ca>
>         <mailto:Eric.Chamberland at giref.ulaval.ca
>         <mailto:Eric.Chamberland at giref.ulaval.ca>>> wrote:
>
>             Hi,
>
>             has someone tried OpenMPI 2.0 with Petsc 3.7.2?
>
>             I am having some errors with petsc, maybe someone have them too?
>
>             Here are the configure logs for PETSc:
>
>
>         http://www.giref.ulaval.ca/~cmpgiref/dernier_ompi/2016.07.25.01h16m02s_configure.log
>
>
>         http://www.giref.ulaval.ca/~cmpgiref/dernier_ompi/2016.07.25.01h16m02s_RDict.log
>
>             And for OpenMPI:
>
>         http://www.giref.ulaval.ca/~cmpgiref/dernier_ompi/2016.07.25.01h16m02s_config.log
>
>             (in fact, I am testing the ompi-release branch, a sort of
>             petsc-master branch, since I need the commit 9ba6678156).
>
>             For a set of parallel tests, I have 104 that works on 124
>         total tests.
>
>
>         It appears that the fault happens when freeing the VecScatter we
>         build
>         for MatMult, which contains Request structures
>         for the ISends and  IRecvs. These looks like internal OpenMPI
>         errors to
>         me since the Request should be opaque.
>         I would try at least two things:
>
>         1) Run under valgrind.
>
>         2) Switch the VecScatter implementation. All the options are here,
>
>
>         http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Vec/VecScatterCreate.html#VecScatterCreate
>
>         but maybe use alltoall.
>
>           Thanks,
>
>              Matt
>
>
>             And the typical error:
>             *** Error in
>
>         `/pmi/cmpbib/compilation_BIB_dernier_ompi/COMPILE_AUTO/GIREF/bin/Test.ProblemeGD.dev':
>             free(): invalid pointer:
>             ======= Backtrace: =========
>             /lib64/libc.so.6(+0x7277f)[0x7f80eb11677f]
>             /lib64/libc.so.6(+0x78026)[0x7f80eb11c026]
>             /lib64/libc.so.6(+0x78d53)[0x7f80eb11cd53]
>
>         /opt/openmpi-2.x_opt/lib/libopen-pal.so.20(opal_free+0x1f)[0x7f80ea8f9d60]
>
>         /opt/openmpi-2.x_opt/lib/openmpi/mca_pml_ob1.so(+0x16628)[0x7f80df0ea628]
>
>         /opt/openmpi-2.x_opt/lib/openmpi/mca_pml_ob1.so(+0x16c50)[0x7f80df0eac50]
>             /opt/openmpi-2.x_opt/lib/libmpi.so.20(+0x9f9dd)[0x7f80eb7029dd]
>
>         /opt/openmpi-2.x_opt/lib/libmpi.so.20(MPI_Request_free+0xf7)[0x7f80eb702ad6]
>
>         /opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(+0x4adc6d)[0x7f80f2fa6c6d]
>
>         /opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(VecScatterDestroy+0x68d)[0x7f80f2fa1c45]
>
>         /opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(+0xa9d0f5)[0x7f80f35960f5]
>
>         /opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(MatDestroy+0x648)[0x7f80f35c2588]
>
>         /opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(+0x10bf0f4)[0x7f80f3bb80f4]
>
>         /opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(PCReset+0x346)[0x7f80f3a796de]
>
>         /opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(KSPReset+0x502)[0x7f80f3d19779]
>
>         /opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(+0x11707f7)[0x7f80f3c697f7]
>
>         /opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(PCReset+0x346)[0x7f80f3a796de]
>
>         /opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(KSPReset+0x502)[0x7f80f3d19779]
>
>         /opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(+0x11707f7)[0x7f80f3c697f7]
>
>         /opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(PCReset+0x346)[0x7f80f3a796de]
>
>         /opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(KSPReset+0x502)[0x7f80f3d19779]
>
>         /opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(+0x11707f7)[0x7f80f3c697f7]
>
>         /opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(PCReset+0x346)[0x7f80f3a796de]
>
>         /opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(PCDestroy+0x5d1)[0x7f80f3a79fd9]
>
>         /opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(KSPDestroy+0x7b6)[0x7f80f3d1a334]
>
>             a similar one:
>             *** Error in
>
>         `/pmi/cmpbib/compilation_BIB_dernier_ompi/COMPILE_AUTO/GIREF/bin/Test.ProbFluideIncompressible.dev':
>             free(): invalid pointer: 0x00007f382a7c5bc0 ***
>             ======= Backtrace: =========
>             /lib64/libc.so.6(+0x7277f)[0x7f3829f1c77f]
>             /lib64/libc.so.6(+0x78026)[0x7f3829f22026]
>             /lib64/libc.so.6(+0x78d53)[0x7f3829f22d53]
>
>         /opt/openmpi-2.x_opt/lib/libopen-pal.so.20(opal_free+0x1f)[0x7f38296ffd60]
>
>         /opt/openmpi-2.x_opt/lib/openmpi/mca_pml_ob1.so(+0x16628)[0x7f381deab628]
>
>         /opt/openmpi-2.x_opt/lib/openmpi/mca_pml_ob1.so(+0x16c50)[0x7f381deabc50]
>             /opt/openmpi-2.x_opt/lib/libmpi.so.20(+0x9f9dd)[0x7f382a5089dd]
>
>         /opt/openmpi-2.x_opt/lib/libmpi.so.20(MPI_Request_free+0xf7)[0x7f382a508ad6]
>
>         /opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(+0x4adc6d)[0x7f3831dacc6d]
>
>         /opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(VecScatterDestroy+0x68d)[0x7f3831da7c45]
>
>         /opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(+0x9f4755)[0x7f38322f3755]
>
>         /opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(MatDestroy+0x648)[0x7f38323c8588]
>
>         /opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(PCReset+0x4e2)[0x7f383287f87a]
>
>         /opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(PCDestroy+0x5d1)[0x7f383287ffd9]
>
>         /opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(KSPDestroy+0x7b6)[0x7f3832b20334]
>
>             another one:
>
>             *** Error in
>
>         `/pmi/cmpbib/compilation_BIB_dernier_ompi/COMPILE_AUTO/GIREF/bin/Test.MortierDiffusion.dev':
>             free(): invalid pointer: 0x00007f67b6d37bc0 ***
>             ======= Backtrace: =========
>             /lib64/libc.so.6(+0x7277f)[0x7f67b648e77f]
>             /lib64/libc.so.6(+0x78026)[0x7f67b6494026]
>             /lib64/libc.so.6(+0x78d53)[0x7f67b6494d53]
>
>         /opt/openmpi-2.x_opt/lib/libopen-pal.so.20(opal_free+0x1f)[0x7f67b5c71d60]
>
>         /opt/openmpi-2.x_opt/lib/openmpi/mca_pml_ob1.so(+0x1adae)[0x7f67aa4cddae]
>
>         /opt/openmpi-2.x_opt/lib/openmpi/mca_pml_ob1.so(+0x1b4ca)[0x7f67aa4ce4ca]
>             /opt/openmpi-2.x_opt/lib/libmpi.so.20(+0x9f9dd)[0x7f67b6a7a9dd]
>
>         /opt/openmpi-2.x_opt/lib/libmpi.so.20(MPI_Request_free+0xf7)[0x7f67b6a7aad6]
>
>         /opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(+0x4adb09)[0x7f67be31eb09]
>
>         /opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(VecScatterDestroy+0x68d)[0x7f67be319c45]
>
>         /opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(+0x4574f7)[0x7f67be2c84f7]
>
>         /opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(VecDestroy+0x648)[0x7f67be26e8da]
>
>             I feel like I should wait until someone else from Petsc have
>         tested
>             it too...
>
>             Thanks,
>
>             Eric
>
>
>
>
>         --
>         What most experimenters take for granted before they begin their
>         experiments is infinitely more interesting than any results to which
>         their experiments lead.
>         -- Norbert Wiener
>
>
>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which
> their experiments lead.
> -- Norbert Wiener


More information about the petsc-users mailing list