[petsc-dev] [OMPI users] potential bug with MPI_Win_fence() in openmpi-1.8.4
Satish Balay
balay at mcs.anl.gov
Thu Apr 30 15:56:05 CDT 2015
Great! Thanks for checking.
Satish
On Thu, 30 Apr 2015, George Bosilca wrote:
> I went over the code and in fact I think it is correct as is. The length is
> for the local representation, which indeed uses pointers to datatype
> structures. On the opposite, the total_pack_size represents the amount of
> space we would need to store the data in a format that can be sent to
> another peer, in which case handling pointers is pointless and we fall back
> to int.
>
> However, I think we are counting twice the space needed for predefined
> data. I'll push a patch shortly.
>
> George.
>
>
> On Thu, Apr 30, 2015 at 3:33 PM, George Bosilca <bosilca at icl.utk.edu> wrote:
>
> > In the packed representation we store not MPI_Datatypes but a handcrafted
> > id for each one. The 2 codes should have been in sync. I'm looking at
> > another issue right now, and I'll come back to this one right after.
> >
> > Thanks for paying attention to the code.
> > George.
> >
> > On Thu, Apr 30, 2015 at 3:13 PM, Satish Balay <balay at mcs.anl.gov> wrote:
> >
> >> Thanks for checking and getting a more appropriate fix in.
> >>
> >> I've just tried this out - and the PETSc test code runs fine with it.
> >>
> >> BTW: There is one inconsistancy in ompi/datatype/ompi_datatype_args.c
> >> [that I noticed] - that you might want to check.
> >> Perhaps the second line should be "(DC) * sizeof(MPI_Datatype)"?
> >>
> >> >>>>>>>>>
> >> int length = sizeof(ompi_datatype_args_t) + (IC) * sizeof(int) + \
> >> (AC) * sizeof(OPAL_PTRDIFF_TYPE) + (DC) *
> >> sizeof(MPI_Datatype); \
> >>
> >>
> >> pArgs->total_pack_size = (4 + (IC)) * sizeof(int) + \
> >> (AC) * sizeof(OPAL_PTRDIFF_TYPE) + (DC) * sizeof(int); \
> >> <<<<<<<<<<<
> >>
> >> Satish
> >>
> >>
> >> On Thu, 30 Apr 2015, Matthew Knepley wrote:
> >>
> >> > On Fri, May 1, 2015 at 4:55 AM, Jeff Squyres (jsquyres) <
> >> jsquyres at cisco.com>
> >> > wrote:
> >> >
> >> > > Thank you!
> >> > >
> >> > > George reviewed your patch and adjusted it a bit. We applied it to
> >> master
> >> > > and it's pending to the release series (v1.8.x).
> >> > >
> >> >
> >> > Was this identified by IBM?
> >> >
> >> >
> >> >
> >> https://github.com/open-mpi/ompi/commit/015d3f56cf749ee5ad9ea4428d2f5da72f9bbe08
> >> >
> >> > Matt
> >> >
> >> >
> >> > > Would you mind testing a nightly master snapshot? It should be in
> >> > > tonight's build:
> >> > >
> >> > > http://www.open-mpi.org/nightly/master/
> >> > >
> >> > >
> >> > >
> >> > > > On Apr 30, 2015, at 12:50 AM, Satish Balay <balay at mcs.anl.gov>
> >> wrote:
> >> > > >
> >> > > > OpenMPI developers,
> >> > > >
> >> > > > We've had issues (memory errors) with OpenMPI - and code in PETSc
> >> > > > library that uses MPI_Win_fence().
> >> > > >
> >> > > > Vagrind shows memory corruption deep inside OpenMPI function stack.
> >> > > >
> >> > > > I'm attaching a potential patch that appears to fix this issue for
> >> us.
> >> > > > [the corresponding valgrind trace is listed in the patch header]
> >> > > >
> >> > > > Perhaps there is a more appropriate fix for this memory corruption.
> >> Could
> >> > > > you check on this?
> >> > > >
> >> > > > [Sorry I don't have a pure MPI test code to demonstrate this error -
> >> > > > but a PETSc test example code consistantly reproduces this issue]
> >> > > >
> >> > > > Thanks,
> >> > > > Satish<openmpi-1.8.4.patch>
> >> > >
> >> > >
> >> > > --
> >> > > Jeff Squyres
> >> > > jsquyres at cisco.com
> >> > > For corporate legal information go to:
> >> > > http://www.cisco.com/web/about/doing_business/legal/cri/
> >> > >
> >> > >
> >> >
> >> >
> >> >
> >>
> >> _______________________________________________
> >> users mailing list
> >> users at open-mpi.org
> >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> >> Link to this post:
> >> http://www.open-mpi.org/community/lists/users/2015/04/26823.php
> >>
> >
> >
>
More information about the petsc-dev
mailing list