[mpich2-dev] 0 byte derived types
Dave Goodell
goodell at mcs.anl.gov
Mon Apr 23 10:19:12 CDT 2012
The assert you quote in the oldest email in the thread is in PAMI code, so I don't think this is directly an MPICH2 bug. I do agree, however, that zero-size, non-zero-count bcast should be tested in our test suite and that we have a coverage gap here.
I'll add creating a test for this to my TODO list.
-Dave
On Apr 23, 2012, at 9:23 AM CDT, Jeff Hammond wrote:
> A bug appeared when BGP came online and has reappared on BGQ. It
> relates to MPI_Bcast of a non-zero number of 0-byte derived datatypes.
> ScaLAPACK is one source of this patter. They have a workaround, but
> it seems to be that either ScaLAPACK is using MPI in a non-compliant
> way or there is a bug in MPICH2 that has persisted across many major
> version releases.
>
> Are you guys aware of this? Has it been fixed in 1.5? Is there a
> test to make sure there is no regression in the future? The ScaLAPACK
> code and comment noting the problem with MPICH is below, as is a long
> email lthread with Brian Smith and Lee on this topic.
>
> Thanks,
>
> Jeff
>
>
>
> #include "Bdef.h"
> MPI_Datatype BI_GetMpiGeType(BLACSCONTEXT *ctxt, int m, int n, int lda,
> MPI_Datatype Dtype, int *N)
> {
> int info;
> MPI_Datatype GeType;
>
> /*
> * Some versions of mpich and its derivitives cannot handle 0 byte typedefs,
> * so we set type MPI_BYTE as a flag for a 0 byte message
> */
> #ifdef ZeroByteTypeBug
> if ( (m < 1) || (n < 1) )
> {
> *N = 0;
> return (MPI_BYTE);
> }
> #endif
> *N = 1;
> info=MPI_Type_vector(n, m, lda, Dtype, &GeType);
> info=MPI_Type_commit(&GeType);
>
> return(GeType);
> }
>
>
> ---------- Forwarded message ----------
> From: Brian Smith <smithbr at us.ibm.com>
> Date: Mon, Apr 23, 2012 at 8:32 AM
> Subject: Re: [td-support #113586] PAMI assertions
> To: Jeff Hammond <jhammond at alcf.anl.gov>
> Cc: jeff.science at gmail.com, Lee Killough <killough at alcf.anl.gov>,
> "td-support at alcf.anl.gov" <td-support at alcf.anl.gov>
>
>
> Well, there are 2 different bugs here.
>
> (from memory) 1) We found places were SCALAPACK made assumptions about
> uninitialized variables that caused significant badness in a number of
> apps. I believe someone reported this to the SCALAPACK maintainers
> many years ago. In fact, they ran something like valgrind and provided
> a patch for *all* of the usage of uninitialized variables. The
> SCALAPACK people did not integrate the changes at the time. Perhaps
> this new release will have some of them added so we don't have to deal
> with that again. See
> http://icl.cs.utk.edu/lapack-forum/viewtopic.php?f=2&t=588&p=1911&hilit=trsm#p1911
> and
> http://icl.cs.utk.edu/lapack-forum/viewtopic.php?f=13&t=2625
>
> (more recently, based on different glue punt-to-MPICH-logic) 2)
> SCALAPACK creates zero-length datatypes and then calls bcast with
> nonzero counts. The glue didn't test for this condition.
>
> A test for #2 in the MPICH2 test bucket wouldn't be a bad thing, but
> #1 is of course beyond the scope of MPICH2. It is possible there is a
> test for #2 already but because of other circumstances (node count for
> example) we might not have seen a failure. I'm sure Dave G. or someone
> could comment on that.
>
>
>
> Brian Smith (smithbr at us.ibm.com)
> BlueGene MPI Development/
> Communications Team Lead
> IBM Master Inventor
> IBM Rochester
> Phone: 507 253 4717
>
>
>
>
> From: Jeff Hammond <jhammond at alcf.anl.gov>
> To: Brian Smith/Rochester/IBM at IBMUS
> Cc: Lee Killough <killough at alcf.anl.gov>,
> "td-support at alcf.anl.gov" <td-support at alcf.anl.gov>
> Date: 04/21/2012 08:51 PM
> Subject: Re: [td-support #113586] PAMI assertions
> Sent by: jeff.science at gmail.com
> ________________________________
>
>
>
> If this showed up on BGP and now on BGQ, why was it not added to the
> MPICH2 test suite 3+ years ago? This is a bug in MPICH2 according to
> the comments in ScaLAPACK and the fact that both BGP and BGQ suffered
> it despite forking vastly different code bases, right?
>
> I'm trying to write a standalone test for this, btw, but haven't been
> successful yet.
>
> Jeff
>
> On Sat, Apr 21, 2012 at 8:46 PM, Brian Smith <smithbr at us.ibm.com> wrote:
>> Hi Lee,
>>
>> It's not actually a user error, what SCALAPACK is doing is (probably, I
>> didn't look at it too much) valid MPI code. However, it is appears to be a
>> weird fringe case that none of the test cases that come with MPICH, nor the
>> gigantic Intel/ANL testbucket found.
>>
>> Basically, we were missing an if() check in the collectives glue to check
>> for nonzero counts of zero length datatypes. The optimized protocols don't
>> deal with things like that which is why there was an assert().
>>
>>
>>
>> Brian Smith (smithbr at us.ibm.com)
>> BlueGene MPI Development/
>> Communications Team Lead
>> IBM Master Inventor
>> IBM Rochester
>> Phone: 507 253 4717
>>
>> "the scientific community is very A-Buzz with positive reviews of Blue Gene
>> ..." - Charles Archer - un-sung hero of technology
>>
>>
>>
>>
>> From: Lee Killough <killough at alcf.anl.gov>
>> To: Jeff Hammond <jhammond at alcf.anl.gov>
>> Cc: "td-support at alcf.anl.gov" <td-support at alcf.anl.gov>, Brian
>> Smith/Rochester/IBM at IBMUS
>> Date: 04/20/2012 11:08 PM
>> Subject: Re: [td-support #113586] PAMI assertions
>> ________________________________
>>
>>
>>
>> Sorry, a busy evening after 6 pm, fast forwarding to this email, and have
>> not read previous.
>>
>> First, if it's a user error, it should never be diagnosed in an assert().
>> assert() is only intended for catching internal errors, and should be turned
>> off in production code. It being an assert() immediately threw me off and
>> made me think it was a configuration issue or mismatched libraries, etc.
>>
>> A new version of ScaLAPACK is about to be released, maybe even as I send
>> this email. I have been working closely with the developers for the past
>> month on several bugs, some of which are only seen on BG, such as illegal
>> Fortran calls with overlapping arguments.
>>
>> If we can identify the cause and work on a fix for this bcast problem, I may
>> be able to get it in before the next release, or maybe not. If you have a
>> BLACS code fix suggestion, please send it and I'll try to get the fix in the
>> next version of ScaLAPACK.
>>
>> Thanks,
>> Lee
>>
>> On Apr 20, 2012, at 22:44, Jeff Hammond <jhammond at alcf.anl.gov> wrote:
>>
>>> IBM says it's a ScaLAPACK problem but that the latest MPI/PAMI has a
>>> fix anyways.
>>>
>>> See if this makes sense from the BLACS code. We can look through MPI
>>> standard together next week to see if BLACS violates it. It would be
>>> the first time Clint Whaley completely screwed up using MPI.
>>>
>>> Jeff
>>>
>>> ---------- Forwarded message ----------
>>> From: Brian Smith <smithbr at us.ibm.com>
>>> Date: Fri, Apr 20, 2012 at 8:47 PM
>>> Subject: Re: Fwd: [td-support #113586] PAMI assertions
>>> To: Jeff Hammond <jhammond at alcf.anl.gov>
>>> Cc: jeff.science at gmail.com, Michael Blocksome <blocksom at us.ibm.com>
>>>
>>> It's goofy datatype stuff in SCALAPACK. There's a fix in head... I
>>> didn't/don't feel it was worth efixing.
>>>
>>> I forget if the problem was a nonzero count with a zero-byte
>>> constructed datatype or a zero count with a non-zero byte constructed
>>> datatype, something stupid like that, so it's unlikely a real
>>> application is going to hit it.
>>>
>>> Brian Smith (smithbr at us.ibm.com)
>>> BlueGene MPI Development/
>>> Communications Team Lead
>>> IBM Master Inventor
>>> IBM Rochester
>>> Phone: 507 253 4717
>>>
>>>
>>>
>>> On Fri, Apr 20, 2012 at 7:34 PM, Jeff Hammond <jhammond at alcf.anl.gov>
>>> wrote:
>>>> 0-byte bcast is fine with the MPI I always use.
>>>>
>>>> Can you print the args at
>>>> /gpfs/veas-fs0/killough/libs/build/SCALAPACK-xl/BLACS/SRC/dgebr2d_.c:127
>>>> and see what I need to test?
>>>>
>>>> Thanks,
>>>>
>>>> Jeff
>>>>
>>>> On Fri, Apr 20, 2012 at 7:18 PM, Jeff Hammond <jhammond at alcf.anl.gov>
>>>> wrote:
>>>>> Assuming I am looking at the same code (PAMI Git head is V1R1M1
>>>>> already and I'm too Git-impaired to toggle for V1R1M0, nor do I want
>>>>> to do this in any case), the assertion that fails indicates a 0-byte
>>>>> message is being attempted.
>>>>>
>>>>> I'll write a test of 0-byte MPI_Bcast right now. Which MPI library
>>>>> are you linking against?
>>>>>
>>>>> Jeff
>>>>>
>>>>> =================================================
>>>>> pami_result_t postShortCollective (uint32_t opcode,
>>>>> uint32_t sizeoftype,
>>>>> uint32_t bytes,
>>>>> char * src,
>>>>> PipeWorkQueue * dpwq,
>>>>> pami_event_function
>>>>> cb_done,
>>>>> void * cookie,
>>>>> unsigned classroute)
>>>>> {
>>>>> TRACE_FN_ENTER();
>>>>> TRACE_FORMAT("opcode %u, sizeoftype %u, bytes %u, src %p,
>>>>> dpwq %p, classroute %u", opcode, sizeoftype, bytes, src, dpwq,
>>>>> classroute);
>>>>> PAMI_assert (bytes <= _collstate._tempSize);
>>>>> PAMI_assert(bytes); /*
>>>>> <------------------------------------------------------------------
>>>>> JEFF: This is line 284 */
>>>>> _int64Cpy(_collstate._tempBuf, src, bytes);
>>>>> //memcpy(_collstate._tempBuf, src, bytes);
>>>>> ...
>>>>> =================================================
>>>>>
>>>>> On Fri, Apr 20, 2012 at 6:06 PM, Lee Killough <killough at alcf.anl.gov>
>>>>> wrote:
>>>>>> With the new GA driver, I'm getting a lot of PAMI assertions when
>>>>>> running
>>>>>> ScaLAPACK programs. The traceback:
>>>>>>
>>>>>>
>>>>>> ------------------------------------------------------------------------
>>>>>> Program : ./xzsep
>>>>>>
>>>>>> ------------------------------------------------------------------------
>>>>>> +++ID Rank: 0, TGID: 1, Core: 0, HWTID:0 TID: 1 State: RUN
>>>>>>
>>>>>> 00000000016a3638
>>>>>> abort
>>>>>>
>>>>>> /bgsys/drivers/V1R1M0/ppc64/toolchain/gnu/glibc-2.12.2/stdlib/abort.c:77
>>>>>>
>>>>>> 000000000169c668
>>>>>> __assert_fail
>>>>>>
>>>>>> /bgsys/drivers/V1R1M0/ppc64/toolchain/gnu/glibc-2.12.2/assert/assert.c:81
>>>>>>
>>>>>> 000000000149774c
>>>>>> PAMI::Device::MU::CollectiveDmaModelBase::postShortCollective(unsigned
>>>>>> int, unsigned int, unsigned int, char*, PAMI::PipeWorkQueue*, void
>>>>>> (*)(void*, void*, pami_result_t), void*, unsigned int)
>>>>>>
>>>>>> /bgsys/source/srcV1R1M0.5670/comm/sys/buildtools/pami/components/devices/bgq/mu2/model/CollectiveDmaModelBase.h:284
>>>>>>
>>>>>> 0000000001497954
>>>>>>
>>>>>> PAMI::Device::MU::CollectiveMulticastDmaModel::postMulticastImmediate_impl(unsigned
>>>>>> long, unsigned long, pami_multicast_t*, void*)
>>>>>>
>>>>>> /bgsys/source/srcV1R1M0.5670/comm/sys/buildtools/pami/components/devices/bgq/mu2/model/CollectiveMulticastDmaModel.h:107
>>>>>>
>>>>>> 0000000001394304
>>>>>>
>>>>>> PAMI::Geometry::Algorithm<PAMI::Geometry::Common>::generate(pami_xfer_t*)
>>>>>>
>>>>>> /bgsys/source/srcV1R1M0.5670/comm/sys/buildtools/pami/algorithms/geometry/Algorithm.h:45
>>>>>>
>>>>>> 0000000001359dec
>>>>>> MPIDO_Bcast
>>>>>>
>>>>>> /bgsys/source/srcV1R1M0.5670/comm/lib/dev/mpich2/src/mpid/pamid/src/coll/bcast/mpido_bcast.c:146
>>>>>>
>>>>>> 00000000012f95cc
>>>>>> MPIR_Bcast_impl
>>>>>>
>>>>>> /bgsys/source/srcV1R1M0.5670/comm/lib/dev/mpich2/src/mpi/coll/bcast.c:1310
>>>>>>
>>>>>> 00000000012f997c
>>>>>> PMPI_Bcast
>>>>>>
>>>>>> /bgsys/source/srcV1R1M0.5670/comm/lib/dev/mpich2/src/mpi/coll/bcast.c:1464
>>>>>>
>>>>>> 000000000101b980
>>>>>> dgebr2d
>>>>>>
>>>>>> /gpfs/veas-fs0/killough/libs/build/SCALAPACK-xl/BLACS/SRC/dgebr2d_.c:127
>>>>>>
>>>>>> 00000000010670f4
>>>>>> pdlared1d
>>>>>> /gpfs/veas-fs0/killough/libs/build/SCALAPACK-xl/SRC/pdlared1d.f:156
>>>>>>
>>>>>> 0000000001045350
>>>>>> pzheevx
>>>>>> /gpfs/veas-fs0/killough/libs/build/SCALAPACK-xl/SRC/pzheevx.f:839
>>>>>>
>>>>>> 00000000010084e0
>>>>>> pzsepsubtst
>>>>>>
>>>>>> /gpfs/veas-fs0/killough/libs/build/SCALAPACK-xl/TESTING/EIG/pzsepsubtst.f:396
>>>>>>
>>>>>> 0000000001002874
>>>>>> pzseptst
>>>>>>
>>>>>> /gpfs/veas-fs0/killough/libs/build/SCALAPACK-xl/TESTING/EIG/pzseptst.f:565
>>>>>>
>>>>>> 00000000010123ec
>>>>>> pzsepreq
>>>>>>
>>>>>> /gpfs/veas-fs0/killough/libs/build/SCALAPACK-xl/TESTING/EIG/pzsepreq.f:205
>>>>>>
>>>>>> 00000000010112e8
>>>>>> pzsepdriver
>>>>>>
>>>>>> /gpfs/veas-fs0/killough/libs/build/SCALAPACK-xl/TESTING/EIG/pzsepdriver.f:229
>>>>>>
>>>>>> 0000000001699b08
>>>>>> generic_start_main
>>>>>>
>>>>>> /bgsys/drivers/V1R1M0/ppc64/toolchain/gnu/glibc-2.12.2/csu/../csu/libc-start.c:226
>>>>>>
>>>>>> 0000000001699e04
>>>>>> __libc_start_main
>>>>>>
>>>>>> /bgsys/drivers/V1R1M0/ppc64/toolchain/gnu/glibc-2.12.2/csu/../sysdeps/unix/sysv/linux/powerpc/libc-start.c:194
>>>>>>
>>>>>> 0000000000000000
>>>>>> ??
>>>>>> ??:0
>>>>>>
>>>>>> It looks like an assertion is failing at:
>>>>>>
>>>>>> PAMI::Device::MU::CollectiveDmaModelBase::postShortCollective(unsigned
>>>>>> int, unsigned int, unsigned int, char*, PAMI::PipeWorkQueue*, void
>>>>>> (*)(void*, void*, pami_result_t), void*, unsigned int)
>>>>>>
>>>>>>
>>>>>> /bgsys/source/srcV1R1M0.5670/comm/sys/buildtools/pami/components/devices/bgq/mu2/model/CollectiveDmaModelBase.h:284
>>>>>>
>>>>>> during a broadcast.
>>>>>>
>>>>>> I don't recall these errors in the previous driver.
>>>>>>
>>>>>> Lee
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Jeff Hammond
>>>>> Argonne Leadership Computing Facility
>>>>> University of Chicago Computation Institute
>>>>> jhammond at alcf.anl.gov / (630) 252-5381
>>>>> http://www.linkedin.com/in/jeffhammond
>>>>> https://wiki.alcf.anl.gov/parts/index.php/User:Jhammond (in-progress)
>>>>> https://wiki.alcf.anl.gov/old/index.php/User:Jhammond (deprecated)
>>>>> https://wiki-old.alcf.anl.gov/index.php/User:Jhammond(deprecated)
>>>>
>>>>
>>>>
>>>> --
>>>> Jeff Hammond
>>>> Argonne Leadership Computing Facility
>>>> University of Chicago Computation Institute
>>>> jhammond at alcf.anl.gov / (630) 252-5381
>>>> http://www.linkedin.com/in/jeffhammond
>>>> https://wiki.alcf.anl.gov/parts/index.php/User:Jhammond (in-progress)
>>>> https://wiki.alcf.anl.gov/old/index.php/User:Jhammond (deprecated)
>>>> https://wiki-old.alcf.anl.gov/index.php/User:Jhammond(deprecated)
>>>
>>>
>>>
>>> --
>>> Jeff Hammond
>>> Argonne Leadership Computing Facility
>>> University of Chicago Computation Institute
>>> jhammond at alcf.anl.gov / (630) 252-5381
>>> http://www.linkedin.com/in/jeffhammond
>>> https://wiki.alcf.anl.gov/parts/index.php/User:Jhammond (in-progress)
>>> https://wiki.alcf.anl.gov/old/index.php/User:Jhammond (deprecated)
>>> https://wiki-old.alcf.anl.gov/index.php/User:Jhammond(deprecated)
>>
>>
>
>
>
> --
> Jeff Hammond
> Argonne Leadership Computing Facility
> University of Chicago Computation Institute
> jhammond at alcf.anl.gov / (630) 252-5381
> http://www.linkedin.com/in/jeffhammond
> https://wiki.alcf.anl.gov/parts/index.php/User:Jhammond (in-progress)
> https://wiki.alcf.anl.gov/old/index.php/User:Jhammond (deprecated)
> https://wiki-old.alcf.anl.gov/index.php/User:Jhammond(deprecated)
>
>
>
>
> --
> Jeff Hammond
> Argonne Leadership Computing Facility
> University of Chicago Computation Institute
> jhammond at alcf.anl.gov / (630) 252-5381
> http://www.linkedin.com/in/jeffhammond
> https://wiki.alcf.anl.gov/parts/index.php/User:Jhammond (in-progress)
> https://wiki.alcf.anl.gov/old/index.php/User:Jhammond (deprecated)
> https://wiki-old.alcf.anl.gov/index.php/User:Jhammond(deprecated)
More information about the mpich2-dev
mailing list