[petsc-users] OpenMPI 2.0 and Petsc 3.7.2
Eric Chamberland
Eric.Chamberland at giref.ulaval.ca
Thu Sep 1 08:04:01 CDT 2016
Just to "close" this thread, the offending bug has been found and
corrected (was with MPI I/O implementation) (see
https://github.com/open-mpi/ompi/issues/1875).
So with forthcoming OpenMPI 2.0.1 everyhting is fine with PETSc for me.
have a nice day!
Eric
On 25/07/16 03:53 PM, Matthew Knepley wrote:
> On Mon, Jul 25, 2016 at 12:44 PM, Eric Chamberland
> <Eric.Chamberland at giref.ulaval.ca
> <mailto:Eric.Chamberland at giref.ulaval.ca>> wrote:
>
> Ok,
>
> here is the 2 points answered:
>
> #1) got valgrind output... here is the fatal free operation:
>
>
> Okay, this is not the MatMult scatter, this is for local representations
> of ghosted vectors. However, to me
> it looks like OpenMPI mistakenly frees its built-in type for MPI_DOUBLE.
>
>
> ==107156== Invalid free() / delete / delete[] / realloc()
> ==107156== at 0x4C2A37C: free (in
> /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
> ==107156== by 0x1E63CD5F: opal_free (malloc.c:184)
> ==107156== by 0x27622627: mca_pml_ob1_recv_request_fini
> (pml_ob1_recvreq.h:133)
> ==107156== by 0x27622C4F: mca_pml_ob1_recv_request_free
> (pml_ob1_recvreq.c:90)
> ==107156== by 0x1D3EF9DC: ompi_request_free (request.h:362)
> ==107156== by 0x1D3EFAD5: PMPI_Request_free (prequest_free.c:59)
> ==107156== by 0x14AE3B9C: VecScatterDestroy_PtoP (vpscat.c:219)
> ==107156== by 0x14ADEB74: VecScatterDestroy (vscat.c:1860)
> ==107156== by 0x14A8D426: VecDestroy_MPI (pdvec.c:25)
> ==107156== by 0x14A33809: VecDestroy (vector.c:432)
> ==107156== by 0x10A2A5AB: GIREFVecDestroy(_p_Vec*&)
> (girefConfigurationPETSc.h:115)
> ==107156== by 0x10BA9F14: VecteurPETSc::detruitObjetPETSc()
> (VecteurPETSc.cc:2292)
> ==107156== by 0x10BA9D0D: VecteurPETSc::~VecteurPETSc()
> (VecteurPETSc.cc:287)
> ==107156== by 0x10BA9F48: VecteurPETSc::~VecteurPETSc()
> (VecteurPETSc.cc:281)
> ==107156== by 0x1135A57B:
> PPReactionsAppuiEL3D::~PPReactionsAppuiEL3D()
> (PPReactionsAppuiEL3D.cc:216)
> ==107156== by 0xCD9A1EA: ProblemeGD::~ProblemeGD() (in
> /home/mefpp_ericc/depots_prepush/GIREF/lib/libgiref_dev_Formulation.so)
> ==107156== by 0x435702: main (Test.ProblemeGD.icc:381)
> ==107156== Address 0x1d6acbc0 is 0 bytes inside data symbol
> "ompi_mpi_double"
> --107156-- REDIR: 0x1dda2680 (libc.so.6:__GI_stpcpy) redirected to
> 0x4c2f330 (__GI_stpcpy)
> ==107156==
> ==107156== Process terminating with default action of signal 6
> (SIGABRT): dumping core
> ==107156== at 0x1DD520C7: raise (in /lib64/libc-2.19.so
> <http://libc-2.19.so>)
> ==107156== by 0x1DD53534: abort (in /lib64/libc-2.19.so
> <http://libc-2.19.so>)
> ==107156== by 0x1DD4B145: __assert_fail_base (in
> /lib64/libc-2.19.so <http://libc-2.19.so>)
> ==107156== by 0x1DD4B1F1: __assert_fail (in /lib64/libc-2.19.so
> <http://libc-2.19.so>)
> ==107156== by 0x27626D12: mca_pml_ob1_send_request_fini
> (pml_ob1_sendreq.h:221)
> ==107156== by 0x276274C9: mca_pml_ob1_send_request_free
> (pml_ob1_sendreq.c:117)
> ==107156== by 0x1D3EF9DC: ompi_request_free (request.h:362)
> ==107156== by 0x1D3EFAD5: PMPI_Request_free (prequest_free.c:59)
> ==107156== by 0x14AE3C3C: VecScatterDestroy_PtoP (vpscat.c:225)
> ==107156== by 0x14ADEB74: VecScatterDestroy (vscat.c:1860)
> ==107156== by 0x14A8D426: VecDestroy_MPI (pdvec.c:25)
> ==107156== by 0x14A33809: VecDestroy (vector.c:432)
> ==107156== by 0x10A2A5AB: GIREFVecDestroy(_p_Vec*&)
> (girefConfigurationPETSc.h:115)
> ==107156== by 0x10BA9F14: VecteurPETSc::detruitObjetPETSc()
> (VecteurPETSc.cc:2292)
> ==107156== by 0x10BA9D0D: VecteurPETSc::~VecteurPETSc()
> (VecteurPETSc.cc:287)
> ==107156== by 0x10BA9F48: VecteurPETSc::~VecteurPETSc()
> (VecteurPETSc.cc:281)
> ==107156== by 0x1135A57B:
> PPReactionsAppuiEL3D::~PPReactionsAppuiEL3D()
> (PPReactionsAppuiEL3D.cc:216)
> ==107156== by 0xCD9A1EA: ProblemeGD::~ProblemeGD() (in
> /home/mefpp_ericc/depots_prepush/GIREF/lib/libgiref_dev_Formulation.so)
> ==107156== by 0x435702: main (Test.ProblemeGD.icc:381)
>
>
> #2) For the run with -vecscatter_alltoall it works...!
>
> As an "end user", should I ever modify these VecScatterCreate
> options? How do they change the performances of the code on large
> problems?
>
>
> Yep, those options are there because the different variants are better
> on different architectures, and you can't know which one to pick until
> runtime,
> (and without experimentation).
>
> Thanks,
>
> Matt
>
>
> Thanks,
>
> Eric
>
> On 25/07/16 02:57 PM, Matthew Knepley wrote:
>
> On Mon, Jul 25, 2016 at 11:33 AM, Eric Chamberland
> <Eric.Chamberland at giref.ulaval.ca
> <mailto:Eric.Chamberland at giref.ulaval.ca>
> <mailto:Eric.Chamberland at giref.ulaval.ca
> <mailto:Eric.Chamberland at giref.ulaval.ca>>> wrote:
>
> Hi,
>
> has someone tried OpenMPI 2.0 with Petsc 3.7.2?
>
> I am having some errors with petsc, maybe someone have them too?
>
> Here are the configure logs for PETSc:
>
>
> http://www.giref.ulaval.ca/~cmpgiref/dernier_ompi/2016.07.25.01h16m02s_configure.log
>
>
> http://www.giref.ulaval.ca/~cmpgiref/dernier_ompi/2016.07.25.01h16m02s_RDict.log
>
> And for OpenMPI:
>
> http://www.giref.ulaval.ca/~cmpgiref/dernier_ompi/2016.07.25.01h16m02s_config.log
>
> (in fact, I am testing the ompi-release branch, a sort of
> petsc-master branch, since I need the commit 9ba6678156).
>
> For a set of parallel tests, I have 104 that works on 124
> total tests.
>
>
> It appears that the fault happens when freeing the VecScatter we
> build
> for MatMult, which contains Request structures
> for the ISends and IRecvs. These looks like internal OpenMPI
> errors to
> me since the Request should be opaque.
> I would try at least two things:
>
> 1) Run under valgrind.
>
> 2) Switch the VecScatter implementation. All the options are here,
>
>
> http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Vec/VecScatterCreate.html#VecScatterCreate
>
> but maybe use alltoall.
>
> Thanks,
>
> Matt
>
>
> And the typical error:
> *** Error in
>
> `/pmi/cmpbib/compilation_BIB_dernier_ompi/COMPILE_AUTO/GIREF/bin/Test.ProblemeGD.dev':
> free(): invalid pointer:
> ======= Backtrace: =========
> /lib64/libc.so.6(+0x7277f)[0x7f80eb11677f]
> /lib64/libc.so.6(+0x78026)[0x7f80eb11c026]
> /lib64/libc.so.6(+0x78d53)[0x7f80eb11cd53]
>
> /opt/openmpi-2.x_opt/lib/libopen-pal.so.20(opal_free+0x1f)[0x7f80ea8f9d60]
>
> /opt/openmpi-2.x_opt/lib/openmpi/mca_pml_ob1.so(+0x16628)[0x7f80df0ea628]
>
> /opt/openmpi-2.x_opt/lib/openmpi/mca_pml_ob1.so(+0x16c50)[0x7f80df0eac50]
> /opt/openmpi-2.x_opt/lib/libmpi.so.20(+0x9f9dd)[0x7f80eb7029dd]
>
> /opt/openmpi-2.x_opt/lib/libmpi.so.20(MPI_Request_free+0xf7)[0x7f80eb702ad6]
>
> /opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(+0x4adc6d)[0x7f80f2fa6c6d]
>
> /opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(VecScatterDestroy+0x68d)[0x7f80f2fa1c45]
>
> /opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(+0xa9d0f5)[0x7f80f35960f5]
>
> /opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(MatDestroy+0x648)[0x7f80f35c2588]
>
> /opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(+0x10bf0f4)[0x7f80f3bb80f4]
>
> /opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(PCReset+0x346)[0x7f80f3a796de]
>
> /opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(KSPReset+0x502)[0x7f80f3d19779]
>
> /opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(+0x11707f7)[0x7f80f3c697f7]
>
> /opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(PCReset+0x346)[0x7f80f3a796de]
>
> /opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(KSPReset+0x502)[0x7f80f3d19779]
>
> /opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(+0x11707f7)[0x7f80f3c697f7]
>
> /opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(PCReset+0x346)[0x7f80f3a796de]
>
> /opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(KSPReset+0x502)[0x7f80f3d19779]
>
> /opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(+0x11707f7)[0x7f80f3c697f7]
>
> /opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(PCReset+0x346)[0x7f80f3a796de]
>
> /opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(PCDestroy+0x5d1)[0x7f80f3a79fd9]
>
> /opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(KSPDestroy+0x7b6)[0x7f80f3d1a334]
>
> a similar one:
> *** Error in
>
> `/pmi/cmpbib/compilation_BIB_dernier_ompi/COMPILE_AUTO/GIREF/bin/Test.ProbFluideIncompressible.dev':
> free(): invalid pointer: 0x00007f382a7c5bc0 ***
> ======= Backtrace: =========
> /lib64/libc.so.6(+0x7277f)[0x7f3829f1c77f]
> /lib64/libc.so.6(+0x78026)[0x7f3829f22026]
> /lib64/libc.so.6(+0x78d53)[0x7f3829f22d53]
>
> /opt/openmpi-2.x_opt/lib/libopen-pal.so.20(opal_free+0x1f)[0x7f38296ffd60]
>
> /opt/openmpi-2.x_opt/lib/openmpi/mca_pml_ob1.so(+0x16628)[0x7f381deab628]
>
> /opt/openmpi-2.x_opt/lib/openmpi/mca_pml_ob1.so(+0x16c50)[0x7f381deabc50]
> /opt/openmpi-2.x_opt/lib/libmpi.so.20(+0x9f9dd)[0x7f382a5089dd]
>
> /opt/openmpi-2.x_opt/lib/libmpi.so.20(MPI_Request_free+0xf7)[0x7f382a508ad6]
>
> /opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(+0x4adc6d)[0x7f3831dacc6d]
>
> /opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(VecScatterDestroy+0x68d)[0x7f3831da7c45]
>
> /opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(+0x9f4755)[0x7f38322f3755]
>
> /opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(MatDestroy+0x648)[0x7f38323c8588]
>
> /opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(PCReset+0x4e2)[0x7f383287f87a]
>
> /opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(PCDestroy+0x5d1)[0x7f383287ffd9]
>
> /opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(KSPDestroy+0x7b6)[0x7f3832b20334]
>
> another one:
>
> *** Error in
>
> `/pmi/cmpbib/compilation_BIB_dernier_ompi/COMPILE_AUTO/GIREF/bin/Test.MortierDiffusion.dev':
> free(): invalid pointer: 0x00007f67b6d37bc0 ***
> ======= Backtrace: =========
> /lib64/libc.so.6(+0x7277f)[0x7f67b648e77f]
> /lib64/libc.so.6(+0x78026)[0x7f67b6494026]
> /lib64/libc.so.6(+0x78d53)[0x7f67b6494d53]
>
> /opt/openmpi-2.x_opt/lib/libopen-pal.so.20(opal_free+0x1f)[0x7f67b5c71d60]
>
> /opt/openmpi-2.x_opt/lib/openmpi/mca_pml_ob1.so(+0x1adae)[0x7f67aa4cddae]
>
> /opt/openmpi-2.x_opt/lib/openmpi/mca_pml_ob1.so(+0x1b4ca)[0x7f67aa4ce4ca]
> /opt/openmpi-2.x_opt/lib/libmpi.so.20(+0x9f9dd)[0x7f67b6a7a9dd]
>
> /opt/openmpi-2.x_opt/lib/libmpi.so.20(MPI_Request_free+0xf7)[0x7f67b6a7aad6]
>
> /opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(+0x4adb09)[0x7f67be31eb09]
>
> /opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(VecScatterDestroy+0x68d)[0x7f67be319c45]
>
> /opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(+0x4574f7)[0x7f67be2c84f7]
>
> /opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(VecDestroy+0x648)[0x7f67be26e8da]
>
> I feel like I should wait until someone else from Petsc have
> tested
> it too...
>
> Thanks,
>
> Eric
>
>
>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which
> their experiments lead.
> -- Norbert Wiener
>
>
>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which
> their experiments lead.
> -- Norbert Wiener
More information about the petsc-users
mailing list