[petsc-users] OpenMPI 2.0 and Petsc 3.7.2
Eric Chamberland
Eric.Chamberland at giref.ulaval.ca
Mon Jul 25 14:44:57 CDT 2016
Ok,
here is the 2 points answered:
#1) got valgrind output... here is the fatal free operation:
==107156== Invalid free() / delete / delete[] / realloc()
==107156== at 0x4C2A37C: free (in
/usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==107156== by 0x1E63CD5F: opal_free (malloc.c:184)
==107156== by 0x27622627: mca_pml_ob1_recv_request_fini
(pml_ob1_recvreq.h:133)
==107156== by 0x27622C4F: mca_pml_ob1_recv_request_free
(pml_ob1_recvreq.c:90)
==107156== by 0x1D3EF9DC: ompi_request_free (request.h:362)
==107156== by 0x1D3EFAD5: PMPI_Request_free (prequest_free.c:59)
==107156== by 0x14AE3B9C: VecScatterDestroy_PtoP (vpscat.c:219)
==107156== by 0x14ADEB74: VecScatterDestroy (vscat.c:1860)
==107156== by 0x14A8D426: VecDestroy_MPI (pdvec.c:25)
==107156== by 0x14A33809: VecDestroy (vector.c:432)
==107156== by 0x10A2A5AB: GIREFVecDestroy(_p_Vec*&)
(girefConfigurationPETSc.h:115)
==107156== by 0x10BA9F14: VecteurPETSc::detruitObjetPETSc()
(VecteurPETSc.cc:2292)
==107156== by 0x10BA9D0D: VecteurPETSc::~VecteurPETSc()
(VecteurPETSc.cc:287)
==107156== by 0x10BA9F48: VecteurPETSc::~VecteurPETSc()
(VecteurPETSc.cc:281)
==107156== by 0x1135A57B:
PPReactionsAppuiEL3D::~PPReactionsAppuiEL3D() (PPReactionsAppuiEL3D.cc:216)
==107156== by 0xCD9A1EA: ProblemeGD::~ProblemeGD() (in
/home/mefpp_ericc/depots_prepush/GIREF/lib/libgiref_dev_Formulation.so)
==107156== by 0x435702: main (Test.ProblemeGD.icc:381)
==107156== Address 0x1d6acbc0 is 0 bytes inside data symbol
"ompi_mpi_double"
--107156-- REDIR: 0x1dda2680 (libc.so.6:__GI_stpcpy) redirected to
0x4c2f330 (__GI_stpcpy)
==107156==
==107156== Process terminating with default action of signal 6
(SIGABRT): dumping core
==107156== at 0x1DD520C7: raise (in /lib64/libc-2.19.so)
==107156== by 0x1DD53534: abort (in /lib64/libc-2.19.so)
==107156== by 0x1DD4B145: __assert_fail_base (in /lib64/libc-2.19.so)
==107156== by 0x1DD4B1F1: __assert_fail (in /lib64/libc-2.19.so)
==107156== by 0x27626D12: mca_pml_ob1_send_request_fini
(pml_ob1_sendreq.h:221)
==107156== by 0x276274C9: mca_pml_ob1_send_request_free
(pml_ob1_sendreq.c:117)
==107156== by 0x1D3EF9DC: ompi_request_free (request.h:362)
==107156== by 0x1D3EFAD5: PMPI_Request_free (prequest_free.c:59)
==107156== by 0x14AE3C3C: VecScatterDestroy_PtoP (vpscat.c:225)
==107156== by 0x14ADEB74: VecScatterDestroy (vscat.c:1860)
==107156== by 0x14A8D426: VecDestroy_MPI (pdvec.c:25)
==107156== by 0x14A33809: VecDestroy (vector.c:432)
==107156== by 0x10A2A5AB: GIREFVecDestroy(_p_Vec*&)
(girefConfigurationPETSc.h:115)
==107156== by 0x10BA9F14: VecteurPETSc::detruitObjetPETSc()
(VecteurPETSc.cc:2292)
==107156== by 0x10BA9D0D: VecteurPETSc::~VecteurPETSc()
(VecteurPETSc.cc:287)
==107156== by 0x10BA9F48: VecteurPETSc::~VecteurPETSc()
(VecteurPETSc.cc:281)
==107156== by 0x1135A57B:
PPReactionsAppuiEL3D::~PPReactionsAppuiEL3D() (PPReactionsAppuiEL3D.cc:216)
==107156== by 0xCD9A1EA: ProblemeGD::~ProblemeGD() (in
/home/mefpp_ericc/depots_prepush/GIREF/lib/libgiref_dev_Formulation.so)
==107156== by 0x435702: main (Test.ProblemeGD.icc:381)
#2) For the run with -vecscatter_alltoall it works...!
As an "end user", should I ever modify these VecScatterCreate options?
How do they change the performances of the code on large problems?
Thanks,
Eric
On 25/07/16 02:57 PM, Matthew Knepley wrote:
> On Mon, Jul 25, 2016 at 11:33 AM, Eric Chamberland
> <Eric.Chamberland at giref.ulaval.ca
> <mailto:Eric.Chamberland at giref.ulaval.ca>> wrote:
>
> Hi,
>
> has someone tried OpenMPI 2.0 with Petsc 3.7.2?
>
> I am having some errors with petsc, maybe someone have them too?
>
> Here are the configure logs for PETSc:
>
> http://www.giref.ulaval.ca/~cmpgiref/dernier_ompi/2016.07.25.01h16m02s_configure.log
>
> http://www.giref.ulaval.ca/~cmpgiref/dernier_ompi/2016.07.25.01h16m02s_RDict.log
>
> And for OpenMPI:
> http://www.giref.ulaval.ca/~cmpgiref/dernier_ompi/2016.07.25.01h16m02s_config.log
>
> (in fact, I am testing the ompi-release branch, a sort of
> petsc-master branch, since I need the commit 9ba6678156).
>
> For a set of parallel tests, I have 104 that works on 124 total tests.
>
>
> It appears that the fault happens when freeing the VecScatter we build
> for MatMult, which contains Request structures
> for the ISends and IRecvs. These looks like internal OpenMPI errors to
> me since the Request should be opaque.
> I would try at least two things:
>
> 1) Run under valgrind.
>
> 2) Switch the VecScatter implementation. All the options are here,
>
> http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Vec/VecScatterCreate.html#VecScatterCreate
>
> but maybe use alltoall.
>
> Thanks,
>
> Matt
>
>
> And the typical error:
> *** Error in
> `/pmi/cmpbib/compilation_BIB_dernier_ompi/COMPILE_AUTO/GIREF/bin/Test.ProblemeGD.dev':
> free(): invalid pointer:
> ======= Backtrace: =========
> /lib64/libc.so.6(+0x7277f)[0x7f80eb11677f]
> /lib64/libc.so.6(+0x78026)[0x7f80eb11c026]
> /lib64/libc.so.6(+0x78d53)[0x7f80eb11cd53]
> /opt/openmpi-2.x_opt/lib/libopen-pal.so.20(opal_free+0x1f)[0x7f80ea8f9d60]
> /opt/openmpi-2.x_opt/lib/openmpi/mca_pml_ob1.so(+0x16628)[0x7f80df0ea628]
> /opt/openmpi-2.x_opt/lib/openmpi/mca_pml_ob1.so(+0x16c50)[0x7f80df0eac50]
> /opt/openmpi-2.x_opt/lib/libmpi.so.20(+0x9f9dd)[0x7f80eb7029dd]
> /opt/openmpi-2.x_opt/lib/libmpi.so.20(MPI_Request_free+0xf7)[0x7f80eb702ad6]
> /opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(+0x4adc6d)[0x7f80f2fa6c6d]
> /opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(VecScatterDestroy+0x68d)[0x7f80f2fa1c45]
> /opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(+0xa9d0f5)[0x7f80f35960f5]
> /opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(MatDestroy+0x648)[0x7f80f35c2588]
> /opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(+0x10bf0f4)[0x7f80f3bb80f4]
> /opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(PCReset+0x346)[0x7f80f3a796de]
> /opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(KSPReset+0x502)[0x7f80f3d19779]
> /opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(+0x11707f7)[0x7f80f3c697f7]
> /opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(PCReset+0x346)[0x7f80f3a796de]
> /opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(KSPReset+0x502)[0x7f80f3d19779]
> /opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(+0x11707f7)[0x7f80f3c697f7]
> /opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(PCReset+0x346)[0x7f80f3a796de]
> /opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(KSPReset+0x502)[0x7f80f3d19779]
> /opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(+0x11707f7)[0x7f80f3c697f7]
> /opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(PCReset+0x346)[0x7f80f3a796de]
> /opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(PCDestroy+0x5d1)[0x7f80f3a79fd9]
> /opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(KSPDestroy+0x7b6)[0x7f80f3d1a334]
>
> a similar one:
> *** Error in
> `/pmi/cmpbib/compilation_BIB_dernier_ompi/COMPILE_AUTO/GIREF/bin/Test.ProbFluideIncompressible.dev':
> free(): invalid pointer: 0x00007f382a7c5bc0 ***
> ======= Backtrace: =========
> /lib64/libc.so.6(+0x7277f)[0x7f3829f1c77f]
> /lib64/libc.so.6(+0x78026)[0x7f3829f22026]
> /lib64/libc.so.6(+0x78d53)[0x7f3829f22d53]
> /opt/openmpi-2.x_opt/lib/libopen-pal.so.20(opal_free+0x1f)[0x7f38296ffd60]
> /opt/openmpi-2.x_opt/lib/openmpi/mca_pml_ob1.so(+0x16628)[0x7f381deab628]
> /opt/openmpi-2.x_opt/lib/openmpi/mca_pml_ob1.so(+0x16c50)[0x7f381deabc50]
> /opt/openmpi-2.x_opt/lib/libmpi.so.20(+0x9f9dd)[0x7f382a5089dd]
> /opt/openmpi-2.x_opt/lib/libmpi.so.20(MPI_Request_free+0xf7)[0x7f382a508ad6]
> /opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(+0x4adc6d)[0x7f3831dacc6d]
> /opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(VecScatterDestroy+0x68d)[0x7f3831da7c45]
> /opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(+0x9f4755)[0x7f38322f3755]
> /opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(MatDestroy+0x648)[0x7f38323c8588]
> /opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(PCReset+0x4e2)[0x7f383287f87a]
> /opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(PCDestroy+0x5d1)[0x7f383287ffd9]
> /opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(KSPDestroy+0x7b6)[0x7f3832b20334]
>
> another one:
>
> *** Error in
> `/pmi/cmpbib/compilation_BIB_dernier_ompi/COMPILE_AUTO/GIREF/bin/Test.MortierDiffusion.dev':
> free(): invalid pointer: 0x00007f67b6d37bc0 ***
> ======= Backtrace: =========
> /lib64/libc.so.6(+0x7277f)[0x7f67b648e77f]
> /lib64/libc.so.6(+0x78026)[0x7f67b6494026]
> /lib64/libc.so.6(+0x78d53)[0x7f67b6494d53]
> /opt/openmpi-2.x_opt/lib/libopen-pal.so.20(opal_free+0x1f)[0x7f67b5c71d60]
> /opt/openmpi-2.x_opt/lib/openmpi/mca_pml_ob1.so(+0x1adae)[0x7f67aa4cddae]
> /opt/openmpi-2.x_opt/lib/openmpi/mca_pml_ob1.so(+0x1b4ca)[0x7f67aa4ce4ca]
> /opt/openmpi-2.x_opt/lib/libmpi.so.20(+0x9f9dd)[0x7f67b6a7a9dd]
> /opt/openmpi-2.x_opt/lib/libmpi.so.20(MPI_Request_free+0xf7)[0x7f67b6a7aad6]
> /opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(+0x4adb09)[0x7f67be31eb09]
> /opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(VecScatterDestroy+0x68d)[0x7f67be319c45]
> /opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(+0x4574f7)[0x7f67be2c84f7]
> /opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(VecDestroy+0x648)[0x7f67be26e8da]
>
> I feel like I should wait until someone else from Petsc have tested
> it too...
>
> Thanks,
>
> Eric
>
>
>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which
> their experiments lead.
> -- Norbert Wiener
More information about the petsc-users
mailing list