[petsc-users] OpenMPI 2.0 and Petsc 3.7.2

Eric Chamberland Eric.Chamberland at giref.ulaval.ca
Mon Jul 25 14:44:57 CDT 2016


Ok,

here is the 2 points answered:

#1) got valgrind output... here is the fatal free operation:

==107156== Invalid free() / delete / delete[] / realloc()
==107156==    at 0x4C2A37C: free (in 
/usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==107156==    by 0x1E63CD5F: opal_free (malloc.c:184)
==107156==    by 0x27622627: mca_pml_ob1_recv_request_fini 
(pml_ob1_recvreq.h:133)
==107156==    by 0x27622C4F: mca_pml_ob1_recv_request_free 
(pml_ob1_recvreq.c:90)
==107156==    by 0x1D3EF9DC: ompi_request_free (request.h:362)
==107156==    by 0x1D3EFAD5: PMPI_Request_free (prequest_free.c:59)
==107156==    by 0x14AE3B9C: VecScatterDestroy_PtoP (vpscat.c:219)
==107156==    by 0x14ADEB74: VecScatterDestroy (vscat.c:1860)
==107156==    by 0x14A8D426: VecDestroy_MPI (pdvec.c:25)
==107156==    by 0x14A33809: VecDestroy (vector.c:432)
==107156==    by 0x10A2A5AB: GIREFVecDestroy(_p_Vec*&) 
(girefConfigurationPETSc.h:115)
==107156==    by 0x10BA9F14: VecteurPETSc::detruitObjetPETSc() 
(VecteurPETSc.cc:2292)
==107156==    by 0x10BA9D0D: VecteurPETSc::~VecteurPETSc() 
(VecteurPETSc.cc:287)
==107156==    by 0x10BA9F48: VecteurPETSc::~VecteurPETSc() 
(VecteurPETSc.cc:281)
==107156==    by 0x1135A57B: 
PPReactionsAppuiEL3D::~PPReactionsAppuiEL3D() (PPReactionsAppuiEL3D.cc:216)
==107156==    by 0xCD9A1EA: ProblemeGD::~ProblemeGD() (in 
/home/mefpp_ericc/depots_prepush/GIREF/lib/libgiref_dev_Formulation.so)
==107156==    by 0x435702: main (Test.ProblemeGD.icc:381)
==107156==  Address 0x1d6acbc0 is 0 bytes inside data symbol 
"ompi_mpi_double"
--107156-- REDIR: 0x1dda2680 (libc.so.6:__GI_stpcpy) redirected to 
0x4c2f330 (__GI_stpcpy)
==107156==
==107156== Process terminating with default action of signal 6 
(SIGABRT): dumping core
==107156==    at 0x1DD520C7: raise (in /lib64/libc-2.19.so)
==107156==    by 0x1DD53534: abort (in /lib64/libc-2.19.so)
==107156==    by 0x1DD4B145: __assert_fail_base (in /lib64/libc-2.19.so)
==107156==    by 0x1DD4B1F1: __assert_fail (in /lib64/libc-2.19.so)
==107156==    by 0x27626D12: mca_pml_ob1_send_request_fini 
(pml_ob1_sendreq.h:221)
==107156==    by 0x276274C9: mca_pml_ob1_send_request_free 
(pml_ob1_sendreq.c:117)
==107156==    by 0x1D3EF9DC: ompi_request_free (request.h:362)
==107156==    by 0x1D3EFAD5: PMPI_Request_free (prequest_free.c:59)
==107156==    by 0x14AE3C3C: VecScatterDestroy_PtoP (vpscat.c:225)
==107156==    by 0x14ADEB74: VecScatterDestroy (vscat.c:1860)
==107156==    by 0x14A8D426: VecDestroy_MPI (pdvec.c:25)
==107156==    by 0x14A33809: VecDestroy (vector.c:432)
==107156==    by 0x10A2A5AB: GIREFVecDestroy(_p_Vec*&) 
(girefConfigurationPETSc.h:115)
==107156==    by 0x10BA9F14: VecteurPETSc::detruitObjetPETSc() 
(VecteurPETSc.cc:2292)
==107156==    by 0x10BA9D0D: VecteurPETSc::~VecteurPETSc() 
(VecteurPETSc.cc:287)
==107156==    by 0x10BA9F48: VecteurPETSc::~VecteurPETSc() 
(VecteurPETSc.cc:281)
==107156==    by 0x1135A57B: 
PPReactionsAppuiEL3D::~PPReactionsAppuiEL3D() (PPReactionsAppuiEL3D.cc:216)
==107156==    by 0xCD9A1EA: ProblemeGD::~ProblemeGD() (in 
/home/mefpp_ericc/depots_prepush/GIREF/lib/libgiref_dev_Formulation.so)
==107156==    by 0x435702: main (Test.ProblemeGD.icc:381)


#2) For the run with -vecscatter_alltoall it works...!

As an "end user", should I ever modify these VecScatterCreate options? 
How do they change the performances of the code on large problems?

Thanks,

Eric

On 25/07/16 02:57 PM, Matthew Knepley wrote:
> On Mon, Jul 25, 2016 at 11:33 AM, Eric Chamberland
> <Eric.Chamberland at giref.ulaval.ca
> <mailto:Eric.Chamberland at giref.ulaval.ca>> wrote:
>
>     Hi,
>
>     has someone tried OpenMPI 2.0 with Petsc 3.7.2?
>
>     I am having some errors with petsc, maybe someone have them too?
>
>     Here are the configure logs for PETSc:
>
>     http://www.giref.ulaval.ca/~cmpgiref/dernier_ompi/2016.07.25.01h16m02s_configure.log
>
>     http://www.giref.ulaval.ca/~cmpgiref/dernier_ompi/2016.07.25.01h16m02s_RDict.log
>
>     And for OpenMPI:
>     http://www.giref.ulaval.ca/~cmpgiref/dernier_ompi/2016.07.25.01h16m02s_config.log
>
>     (in fact, I am testing the ompi-release branch, a sort of
>     petsc-master branch, since I need the commit 9ba6678156).
>
>     For a set of parallel tests, I have 104 that works on 124 total tests.
>
>
> It appears that the fault happens when freeing the VecScatter we build
> for MatMult, which contains Request structures
> for the ISends and  IRecvs. These looks like internal OpenMPI errors to
> me since the Request should be opaque.
> I would try at least two things:
>
> 1) Run under valgrind.
>
> 2) Switch the VecScatter implementation. All the options are here,
>
>   http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Vec/VecScatterCreate.html#VecScatterCreate
>
> but maybe use alltoall.
>
>   Thanks,
>
>      Matt
>
>
>     And the typical error:
>     *** Error in
>     `/pmi/cmpbib/compilation_BIB_dernier_ompi/COMPILE_AUTO/GIREF/bin/Test.ProblemeGD.dev':
>     free(): invalid pointer:
>     ======= Backtrace: =========
>     /lib64/libc.so.6(+0x7277f)[0x7f80eb11677f]
>     /lib64/libc.so.6(+0x78026)[0x7f80eb11c026]
>     /lib64/libc.so.6(+0x78d53)[0x7f80eb11cd53]
>     /opt/openmpi-2.x_opt/lib/libopen-pal.so.20(opal_free+0x1f)[0x7f80ea8f9d60]
>     /opt/openmpi-2.x_opt/lib/openmpi/mca_pml_ob1.so(+0x16628)[0x7f80df0ea628]
>     /opt/openmpi-2.x_opt/lib/openmpi/mca_pml_ob1.so(+0x16c50)[0x7f80df0eac50]
>     /opt/openmpi-2.x_opt/lib/libmpi.so.20(+0x9f9dd)[0x7f80eb7029dd]
>     /opt/openmpi-2.x_opt/lib/libmpi.so.20(MPI_Request_free+0xf7)[0x7f80eb702ad6]
>     /opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(+0x4adc6d)[0x7f80f2fa6c6d]
>     /opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(VecScatterDestroy+0x68d)[0x7f80f2fa1c45]
>     /opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(+0xa9d0f5)[0x7f80f35960f5]
>     /opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(MatDestroy+0x648)[0x7f80f35c2588]
>     /opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(+0x10bf0f4)[0x7f80f3bb80f4]
>     /opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(PCReset+0x346)[0x7f80f3a796de]
>     /opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(KSPReset+0x502)[0x7f80f3d19779]
>     /opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(+0x11707f7)[0x7f80f3c697f7]
>     /opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(PCReset+0x346)[0x7f80f3a796de]
>     /opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(KSPReset+0x502)[0x7f80f3d19779]
>     /opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(+0x11707f7)[0x7f80f3c697f7]
>     /opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(PCReset+0x346)[0x7f80f3a796de]
>     /opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(KSPReset+0x502)[0x7f80f3d19779]
>     /opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(+0x11707f7)[0x7f80f3c697f7]
>     /opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(PCReset+0x346)[0x7f80f3a796de]
>     /opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(PCDestroy+0x5d1)[0x7f80f3a79fd9]
>     /opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(KSPDestroy+0x7b6)[0x7f80f3d1a334]
>
>     a similar one:
>     *** Error in
>     `/pmi/cmpbib/compilation_BIB_dernier_ompi/COMPILE_AUTO/GIREF/bin/Test.ProbFluideIncompressible.dev':
>     free(): invalid pointer: 0x00007f382a7c5bc0 ***
>     ======= Backtrace: =========
>     /lib64/libc.so.6(+0x7277f)[0x7f3829f1c77f]
>     /lib64/libc.so.6(+0x78026)[0x7f3829f22026]
>     /lib64/libc.so.6(+0x78d53)[0x7f3829f22d53]
>     /opt/openmpi-2.x_opt/lib/libopen-pal.so.20(opal_free+0x1f)[0x7f38296ffd60]
>     /opt/openmpi-2.x_opt/lib/openmpi/mca_pml_ob1.so(+0x16628)[0x7f381deab628]
>     /opt/openmpi-2.x_opt/lib/openmpi/mca_pml_ob1.so(+0x16c50)[0x7f381deabc50]
>     /opt/openmpi-2.x_opt/lib/libmpi.so.20(+0x9f9dd)[0x7f382a5089dd]
>     /opt/openmpi-2.x_opt/lib/libmpi.so.20(MPI_Request_free+0xf7)[0x7f382a508ad6]
>     /opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(+0x4adc6d)[0x7f3831dacc6d]
>     /opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(VecScatterDestroy+0x68d)[0x7f3831da7c45]
>     /opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(+0x9f4755)[0x7f38322f3755]
>     /opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(MatDestroy+0x648)[0x7f38323c8588]
>     /opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(PCReset+0x4e2)[0x7f383287f87a]
>     /opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(PCDestroy+0x5d1)[0x7f383287ffd9]
>     /opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(KSPDestroy+0x7b6)[0x7f3832b20334]
>
>     another one:
>
>     *** Error in
>     `/pmi/cmpbib/compilation_BIB_dernier_ompi/COMPILE_AUTO/GIREF/bin/Test.MortierDiffusion.dev':
>     free(): invalid pointer: 0x00007f67b6d37bc0 ***
>     ======= Backtrace: =========
>     /lib64/libc.so.6(+0x7277f)[0x7f67b648e77f]
>     /lib64/libc.so.6(+0x78026)[0x7f67b6494026]
>     /lib64/libc.so.6(+0x78d53)[0x7f67b6494d53]
>     /opt/openmpi-2.x_opt/lib/libopen-pal.so.20(opal_free+0x1f)[0x7f67b5c71d60]
>     /opt/openmpi-2.x_opt/lib/openmpi/mca_pml_ob1.so(+0x1adae)[0x7f67aa4cddae]
>     /opt/openmpi-2.x_opt/lib/openmpi/mca_pml_ob1.so(+0x1b4ca)[0x7f67aa4ce4ca]
>     /opt/openmpi-2.x_opt/lib/libmpi.so.20(+0x9f9dd)[0x7f67b6a7a9dd]
>     /opt/openmpi-2.x_opt/lib/libmpi.so.20(MPI_Request_free+0xf7)[0x7f67b6a7aad6]
>     /opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(+0x4adb09)[0x7f67be31eb09]
>     /opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(VecScatterDestroy+0x68d)[0x7f67be319c45]
>     /opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(+0x4574f7)[0x7f67be2c84f7]
>     /opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(VecDestroy+0x648)[0x7f67be26e8da]
>
>     I feel like I should wait until someone else from Petsc have tested
>     it too...
>
>     Thanks,
>
>     Eric
>
>
>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which
> their experiments lead.
> -- Norbert Wiener


More information about the petsc-users mailing list