[mpich2-dev] 0 byte derived types

Dave Goodell goodell at mcs.anl.gov
Mon Apr 23 10:19:12 CDT 2012


The assert you quote in the oldest email in the thread is in PAMI code, so I don't think this is directly an MPICH2 bug.  I do agree, however, that zero-size, non-zero-count bcast should be tested in our test suite and that we have a coverage gap here.

I'll add creating a test for this to my TODO list.

-Dave

On Apr 23, 2012, at 9:23 AM CDT, Jeff Hammond wrote:

> A bug appeared when BGP came online and has reappared on BGQ.  It
> relates to MPI_Bcast of a non-zero number of 0-byte derived datatypes.
> ScaLAPACK is one source of this patter.  They have a workaround, but
> it seems to be that either ScaLAPACK is using MPI in a non-compliant
> way or there is a bug in MPICH2 that has persisted across many major
> version releases.
> 
> Are you guys aware of this?  Has it been fixed in 1.5?  Is there a
> test to make sure there is no regression in the future?  The ScaLAPACK
> code and comment noting the problem with MPICH is below, as is a long
> email lthread with Brian Smith and Lee on this topic.
> 
> Thanks,
> 
> Jeff
> 
> 
> 
> #include "Bdef.h"
> MPI_Datatype BI_GetMpiGeType(BLACSCONTEXT *ctxt, int m, int n, int lda,
>                               MPI_Datatype Dtype, int *N)
> {
>  int info;
>  MPI_Datatype GeType;
> 
> /*
> * Some versions of mpich and its derivitives cannot handle 0 byte typedefs,
> * so we set type MPI_BYTE as a flag for a 0 byte message
> */
> #ifdef ZeroByteTypeBug
>  if ( (m < 1) || (n < 1) )
>  {
>     *N = 0;
>     return (MPI_BYTE);
>  }
> #endif
>  *N = 1;
>  info=MPI_Type_vector(n, m, lda, Dtype, &GeType);
>  info=MPI_Type_commit(&GeType);
> 
>  return(GeType);
> }
> 
> 
> ---------- Forwarded message ----------
> From: Brian Smith <smithbr at us.ibm.com>
> Date: Mon, Apr 23, 2012 at 8:32 AM
> Subject: Re: [td-support #113586] PAMI assertions
> To: Jeff Hammond <jhammond at alcf.anl.gov>
> Cc: jeff.science at gmail.com, Lee Killough <killough at alcf.anl.gov>,
> "td-support at alcf.anl.gov" <td-support at alcf.anl.gov>
> 
> 
> Well, there are 2 different bugs here.
> 
> (from memory) 1) We found places were SCALAPACK made assumptions about
> uninitialized variables that caused significant badness in a number of
> apps. I believe someone reported this to the SCALAPACK maintainers
> many years ago. In fact, they ran something like valgrind and provided
> a patch for *all* of the usage of uninitialized variables. The
> SCALAPACK people did not integrate the changes at the time. Perhaps
> this new release will have some of them added so we don't have to deal
> with that again. See
> http://icl.cs.utk.edu/lapack-forum/viewtopic.php?f=2&t=588&p=1911&hilit=trsm#p1911
> and
> http://icl.cs.utk.edu/lapack-forum/viewtopic.php?f=13&t=2625
> 
> (more recently, based on different glue punt-to-MPICH-logic) 2)
> SCALAPACK creates zero-length datatypes and then calls bcast with
> nonzero counts. The glue didn't test for this condition.
> 
> A test for #2 in the MPICH2 test bucket wouldn't be a bad thing, but
> #1 is of course beyond the scope of MPICH2. It is possible there is a
> test for #2 already but because of other circumstances (node count for
> example) we might not have seen a failure. I'm sure Dave G. or someone
> could comment on that.
> 
> 
> 
> Brian Smith (smithbr at us.ibm.com)
> BlueGene MPI Development/
> Communications Team Lead
> IBM Master Inventor
> IBM Rochester
> Phone: 507 253 4717
> 
> 
> 
> 
> From:        Jeff Hammond <jhammond at alcf.anl.gov>
> To:        Brian Smith/Rochester/IBM at IBMUS
> Cc:        Lee Killough <killough at alcf.anl.gov>,
> "td-support at alcf.anl.gov" <td-support at alcf.anl.gov>
> Date:        04/21/2012 08:51 PM
> Subject:        Re: [td-support #113586] PAMI assertions
> Sent by:        jeff.science at gmail.com
> ________________________________
> 
> 
> 
> If this showed up on BGP and now on BGQ, why was it not added to the
> MPICH2 test suite 3+ years ago?  This is a bug in MPICH2 according to
> the comments in ScaLAPACK and the fact that both BGP and BGQ suffered
> it despite forking vastly different code bases, right?
> 
> I'm trying to write a standalone test for this, btw, but haven't been
> successful yet.
> 
> Jeff
> 
> On Sat, Apr 21, 2012 at 8:46 PM, Brian Smith <smithbr at us.ibm.com> wrote:
>> Hi Lee,
>> 
>> It's not actually a user error, what SCALAPACK is doing is (probably, I
>> didn't look at it too much) valid MPI code. However, it is appears to be a
>> weird fringe case that none of the test cases that come with MPICH, nor the
>> gigantic Intel/ANL testbucket found.
>> 
>> Basically, we were missing an if() check in the collectives glue to check
>> for nonzero counts of zero length datatypes. The optimized protocols don't
>> deal with things like that which is why there was an assert().
>> 
>> 
>> 
>> Brian Smith (smithbr at us.ibm.com)
>> BlueGene MPI Development/
>> Communications Team Lead
>> IBM Master Inventor
>> IBM Rochester
>> Phone: 507 253 4717
>> 
>> "the scientific community is very A-Buzz with positive reviews of Blue Gene
>> ..." - Charles Archer - un-sung hero of technology
>> 
>> 
>> 
>> 
>> From:        Lee Killough <killough at alcf.anl.gov>
>> To:        Jeff Hammond <jhammond at alcf.anl.gov>
>> Cc:        "td-support at alcf.anl.gov" <td-support at alcf.anl.gov>, Brian
>> Smith/Rochester/IBM at IBMUS
>> Date:        04/20/2012 11:08 PM
>> Subject:        Re: [td-support #113586] PAMI assertions
>> ________________________________
>> 
>> 
>> 
>> Sorry, a busy evening after 6 pm, fast forwarding to this email, and have
>> not read previous.
>> 
>> First, if it's a user error, it should never be diagnosed in an assert().
>> assert() is only intended for catching internal errors, and should be turned
>> off in production code. It being an assert() immediately threw me off and
>> made me think it was a configuration issue or mismatched libraries, etc.
>> 
>> A new version of ScaLAPACK is about to be released, maybe even as I send
>> this email. I have been working closely with the developers for the past
>> month on several bugs, some of which are only seen on BG, such as illegal
>> Fortran calls with overlapping arguments.
>> 
>> If we can identify the cause and work on a fix for this bcast problem, I may
>> be able to get it in before the next release, or maybe not. If you have a
>> BLACS code fix suggestion, please send it and I'll try to get the fix in the
>> next version of ScaLAPACK.
>> 
>> Thanks,
>> Lee
>> 
>> On Apr 20, 2012, at 22:44, Jeff Hammond <jhammond at alcf.anl.gov> wrote:
>> 
>>> IBM says it's a ScaLAPACK problem but that the latest MPI/PAMI has a
>>> fix anyways.
>>> 
>>> See if this makes sense from the BLACS code.  We can look through MPI
>>> standard together next week to see if BLACS violates it.  It would be
>>> the first time Clint Whaley completely screwed up using MPI.
>>> 
>>> Jeff
>>> 
>>> ---------- Forwarded message ----------
>>> From: Brian Smith <smithbr at us.ibm.com>
>>> Date: Fri, Apr 20, 2012 at 8:47 PM
>>> Subject: Re: Fwd: [td-support #113586] PAMI assertions
>>> To: Jeff Hammond <jhammond at alcf.anl.gov>
>>> Cc: jeff.science at gmail.com, Michael Blocksome <blocksom at us.ibm.com>
>>> 
>>> It's goofy datatype stuff in SCALAPACK. There's a fix in head... I
>>> didn't/don't feel it was worth efixing.
>>> 
>>> I forget if the problem was a nonzero count with a zero-byte
>>> constructed datatype or a zero count with a non-zero byte constructed
>>> datatype, something stupid like that, so it's unlikely a real
>>> application is going to hit it.
>>> 
>>> Brian Smith (smithbr at us.ibm.com)
>>> BlueGene MPI Development/
>>> Communications Team Lead
>>> IBM Master Inventor
>>> IBM Rochester
>>> Phone: 507 253 4717
>>> 
>>> 
>>> 
>>> On Fri, Apr 20, 2012 at 7:34 PM, Jeff Hammond <jhammond at alcf.anl.gov>
>>> wrote:
>>>> 0-byte bcast is fine with the MPI I always use.
>>>> 
>>>> Can you print the args at
>>>> /gpfs/veas-fs0/killough/libs/build/SCALAPACK-xl/BLACS/SRC/dgebr2d_.c:127
>>>> and see what I need to test?
>>>> 
>>>> Thanks,
>>>> 
>>>> Jeff
>>>> 
>>>> On Fri, Apr 20, 2012 at 7:18 PM, Jeff Hammond <jhammond at alcf.anl.gov>
>>>> wrote:
>>>>> Assuming I am looking at the same code (PAMI Git head is V1R1M1
>>>>> already and I'm too Git-impaired to toggle for V1R1M0, nor do I want
>>>>> to do this in any case), the assertion that fails indicates a 0-byte
>>>>> message is being attempted.
>>>>> 
>>>>> I'll write a test of 0-byte MPI_Bcast right now.  Which MPI library
>>>>> are you linking against?
>>>>> 
>>>>> Jeff
>>>>> 
>>>>> =================================================
>>>>>          pami_result_t  postShortCollective (uint32_t        opcode,
>>>>>                                              uint32_t        sizeoftype,
>>>>>                                              uint32_t        bytes,
>>>>>                                              char          * src,
>>>>>                                              PipeWorkQueue * dpwq,
>>>>>                                              pami_event_function
>>>>> cb_done,
>>>>>                                              void          * cookie,
>>>>>                                              unsigned        classroute)
>>>>>          {
>>>>>            TRACE_FN_ENTER();
>>>>>            TRACE_FORMAT("opcode %u, sizeoftype %u, bytes %u, src %p,
>>>>> dpwq %p, classroute %u", opcode, sizeoftype, bytes, src, dpwq,
>>>>> classroute);
>>>>>            PAMI_assert (bytes <= _collstate._tempSize);
>>>>>            PAMI_assert(bytes);  /*
>>>>> <------------------------------------------------------------------
>>>>> JEFF: This is line 284 */
>>>>>            _int64Cpy(_collstate._tempBuf, src, bytes);
>>>>>            //memcpy(_collstate._tempBuf, src, bytes);
>>>>> ...
>>>>> =================================================
>>>>> 
>>>>> On Fri, Apr 20, 2012 at 6:06 PM, Lee Killough <killough at alcf.anl.gov>
>>>>> wrote:
>>>>>> With the new GA driver, I'm getting a lot of PAMI assertions when
>>>>>> running
>>>>>> ScaLAPACK programs. The traceback:
>>>>>> 
>>>>>> 
>>>>>> ------------------------------------------------------------------------
>>>>>> Program   : ./xzsep
>>>>>> 
>>>>>> ------------------------------------------------------------------------
>>>>>> +++ID Rank: 0, TGID: 1, Core: 0, HWTID:0 TID: 1 State: RUN
>>>>>> 
>>>>>> 00000000016a3638
>>>>>> abort
>>>>>> 
>>>>>> /bgsys/drivers/V1R1M0/ppc64/toolchain/gnu/glibc-2.12.2/stdlib/abort.c:77
>>>>>> 
>>>>>> 000000000169c668
>>>>>> __assert_fail
>>>>>> 
>>>>>> /bgsys/drivers/V1R1M0/ppc64/toolchain/gnu/glibc-2.12.2/assert/assert.c:81
>>>>>> 
>>>>>> 000000000149774c
>>>>>> PAMI::Device::MU::CollectiveDmaModelBase::postShortCollective(unsigned
>>>>>> int, unsigned int, unsigned int, char*, PAMI::PipeWorkQueue*, void
>>>>>> (*)(void*, void*, pami_result_t), void*, unsigned int)
>>>>>> 
>>>>>> /bgsys/source/srcV1R1M0.5670/comm/sys/buildtools/pami/components/devices/bgq/mu2/model/CollectiveDmaModelBase.h:284
>>>>>> 
>>>>>> 0000000001497954
>>>>>> 
>>>>>> PAMI::Device::MU::CollectiveMulticastDmaModel::postMulticastImmediate_impl(unsigned
>>>>>> long, unsigned long, pami_multicast_t*, void*)
>>>>>> 
>>>>>> /bgsys/source/srcV1R1M0.5670/comm/sys/buildtools/pami/components/devices/bgq/mu2/model/CollectiveMulticastDmaModel.h:107
>>>>>> 
>>>>>> 0000000001394304
>>>>>> 
>>>>>> PAMI::Geometry::Algorithm<PAMI::Geometry::Common>::generate(pami_xfer_t*)
>>>>>> 
>>>>>> /bgsys/source/srcV1R1M0.5670/comm/sys/buildtools/pami/algorithms/geometry/Algorithm.h:45
>>>>>> 
>>>>>> 0000000001359dec
>>>>>> MPIDO_Bcast
>>>>>> 
>>>>>> /bgsys/source/srcV1R1M0.5670/comm/lib/dev/mpich2/src/mpid/pamid/src/coll/bcast/mpido_bcast.c:146
>>>>>> 
>>>>>> 00000000012f95cc
>>>>>> MPIR_Bcast_impl
>>>>>> 
>>>>>> /bgsys/source/srcV1R1M0.5670/comm/lib/dev/mpich2/src/mpi/coll/bcast.c:1310
>>>>>> 
>>>>>> 00000000012f997c
>>>>>> PMPI_Bcast
>>>>>> 
>>>>>> /bgsys/source/srcV1R1M0.5670/comm/lib/dev/mpich2/src/mpi/coll/bcast.c:1464
>>>>>> 
>>>>>> 000000000101b980
>>>>>> dgebr2d
>>>>>> 
>>>>>> /gpfs/veas-fs0/killough/libs/build/SCALAPACK-xl/BLACS/SRC/dgebr2d_.c:127
>>>>>> 
>>>>>> 00000000010670f4
>>>>>> pdlared1d
>>>>>> /gpfs/veas-fs0/killough/libs/build/SCALAPACK-xl/SRC/pdlared1d.f:156
>>>>>> 
>>>>>> 0000000001045350
>>>>>> pzheevx
>>>>>> /gpfs/veas-fs0/killough/libs/build/SCALAPACK-xl/SRC/pzheevx.f:839
>>>>>> 
>>>>>> 00000000010084e0
>>>>>> pzsepsubtst
>>>>>> 
>>>>>> /gpfs/veas-fs0/killough/libs/build/SCALAPACK-xl/TESTING/EIG/pzsepsubtst.f:396
>>>>>> 
>>>>>> 0000000001002874
>>>>>> pzseptst
>>>>>> 
>>>>>> /gpfs/veas-fs0/killough/libs/build/SCALAPACK-xl/TESTING/EIG/pzseptst.f:565
>>>>>> 
>>>>>> 00000000010123ec
>>>>>> pzsepreq
>>>>>> 
>>>>>> /gpfs/veas-fs0/killough/libs/build/SCALAPACK-xl/TESTING/EIG/pzsepreq.f:205
>>>>>> 
>>>>>> 00000000010112e8
>>>>>> pzsepdriver
>>>>>> 
>>>>>> /gpfs/veas-fs0/killough/libs/build/SCALAPACK-xl/TESTING/EIG/pzsepdriver.f:229
>>>>>> 
>>>>>> 0000000001699b08
>>>>>> generic_start_main
>>>>>> 
>>>>>> /bgsys/drivers/V1R1M0/ppc64/toolchain/gnu/glibc-2.12.2/csu/../csu/libc-start.c:226
>>>>>> 
>>>>>> 0000000001699e04
>>>>>> __libc_start_main
>>>>>> 
>>>>>> /bgsys/drivers/V1R1M0/ppc64/toolchain/gnu/glibc-2.12.2/csu/../sysdeps/unix/sysv/linux/powerpc/libc-start.c:194
>>>>>> 
>>>>>> 0000000000000000
>>>>>> ??
>>>>>> ??:0
>>>>>> 
>>>>>> It looks like an assertion is failing at:
>>>>>> 
>>>>>> PAMI::Device::MU::CollectiveDmaModelBase::postShortCollective(unsigned
>>>>>> int, unsigned int, unsigned int, char*, PAMI::PipeWorkQueue*, void
>>>>>> (*)(void*, void*, pami_result_t), void*, unsigned int)
>>>>>> 
>>>>>> 
>>>>>> /bgsys/source/srcV1R1M0.5670/comm/sys/buildtools/pami/components/devices/bgq/mu2/model/CollectiveDmaModelBase.h:284
>>>>>> 
>>>>>> during a broadcast.
>>>>>> 
>>>>>> I don't recall these errors in the previous driver.
>>>>>> 
>>>>>> Lee
>>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> --
>>>>> Jeff Hammond
>>>>> Argonne Leadership Computing Facility
>>>>> University of Chicago Computation Institute
>>>>> jhammond at alcf.anl.gov / (630) 252-5381
>>>>> http://www.linkedin.com/in/jeffhammond
>>>>> https://wiki.alcf.anl.gov/parts/index.php/User:Jhammond (in-progress)
>>>>> https://wiki.alcf.anl.gov/old/index.php/User:Jhammond (deprecated)
>>>>> https://wiki-old.alcf.anl.gov/index.php/User:Jhammond(deprecated)
>>>> 
>>>> 
>>>> 
>>>> --
>>>> Jeff Hammond
>>>> Argonne Leadership Computing Facility
>>>> University of Chicago Computation Institute
>>>> jhammond at alcf.anl.gov / (630) 252-5381
>>>> http://www.linkedin.com/in/jeffhammond
>>>> https://wiki.alcf.anl.gov/parts/index.php/User:Jhammond (in-progress)
>>>> https://wiki.alcf.anl.gov/old/index.php/User:Jhammond (deprecated)
>>>> https://wiki-old.alcf.anl.gov/index.php/User:Jhammond(deprecated)
>>> 
>>> 
>>> 
>>> --
>>> Jeff Hammond
>>> Argonne Leadership Computing Facility
>>> University of Chicago Computation Institute
>>> jhammond at alcf.anl.gov / (630) 252-5381
>>> http://www.linkedin.com/in/jeffhammond
>>> https://wiki.alcf.anl.gov/parts/index.php/User:Jhammond (in-progress)
>>> https://wiki.alcf.anl.gov/old/index.php/User:Jhammond (deprecated)
>>> https://wiki-old.alcf.anl.gov/index.php/User:Jhammond(deprecated)
>> 
>> 
> 
> 
> 
> --
> Jeff Hammond
> Argonne Leadership Computing Facility
> University of Chicago Computation Institute
> jhammond at alcf.anl.gov / (630) 252-5381
> http://www.linkedin.com/in/jeffhammond
> https://wiki.alcf.anl.gov/parts/index.php/User:Jhammond (in-progress)
> https://wiki.alcf.anl.gov/old/index.php/User:Jhammond (deprecated)
> https://wiki-old.alcf.anl.gov/index.php/User:Jhammond(deprecated)
> 
> 
> 
> 
> -- 
> Jeff Hammond
> Argonne Leadership Computing Facility
> University of Chicago Computation Institute
> jhammond at alcf.anl.gov / (630) 252-5381
> http://www.linkedin.com/in/jeffhammond
> https://wiki.alcf.anl.gov/parts/index.php/User:Jhammond (in-progress)
> https://wiki.alcf.anl.gov/old/index.php/User:Jhammond (deprecated)
> https://wiki-old.alcf.anl.gov/index.php/User:Jhammond(deprecated)



More information about the mpich2-dev mailing list