[mpich2-dev] 0 byte derived types
Dave Goodell
goodell at mcs.anl.gov
Mon May 7 14:23:02 CDT 2012
FYI, I've added a test for this in r9835. Stock MPICH2 passed it, although I made a few fixes in r9836 to avoid doing some pointless communication. It looks like there was a potential bug in the heterogeneous support case (now fixed), but all of that support has been untested for years.
-Dave
On Apr 23, 2012, at 10:22 AM CDT, Jeff Hammond wrote:
> If you read Brian's email, he indicates that there is a problem with
> PAMI, but this is also a problem at the MPI level, if one is to
> believe the comment in BLACS. Lee noted that he had to workaround
> this problem with NEC-MPI as well, so it appears to be a corner case
> that is overlooked in multiple implementations (for good reason; this
> use case is ridiculous).
>
> Thanks,
>
> Jeff
>
> On Mon, Apr 23, 2012 at 10:19 AM, Dave Goodell <goodell at mcs.anl.gov> wrote:
>> The assert you quote in the oldest email in the thread is in PAMI code, so I don't think this is directly an MPICH2 bug. I do agree, however, that zero-size, non-zero-count bcast should be tested in our test suite and that we have a coverage gap here.
>>
>> I'll add creating a test for this to my TODO list.
>>
>> -Dave
>>
>> On Apr 23, 2012, at 9:23 AM CDT, Jeff Hammond wrote:
>>
>>> A bug appeared when BGP came online and has reappared on BGQ. It
>>> relates to MPI_Bcast of a non-zero number of 0-byte derived datatypes.
>>> ScaLAPACK is one source of this patter. They have a workaround, but
>>> it seems to be that either ScaLAPACK is using MPI in a non-compliant
>>> way or there is a bug in MPICH2 that has persisted across many major
>>> version releases.
>>>
>>> Are you guys aware of this? Has it been fixed in 1.5? Is there a
>>> test to make sure there is no regression in the future? The ScaLAPACK
>>> code and comment noting the problem with MPICH is below, as is a long
>>> email lthread with Brian Smith and Lee on this topic.
>>>
>>> Thanks,
>>>
>>> Jeff
>>>
>>>
>>>
>>> #include "Bdef.h"
>>> MPI_Datatype BI_GetMpiGeType(BLACSCONTEXT *ctxt, int m, int n, int lda,
>>> MPI_Datatype Dtype, int *N)
>>> {
>>> int info;
>>> MPI_Datatype GeType;
>>>
>>> /*
>>> * Some versions of mpich and its derivitives cannot handle 0 byte typedefs,
>>> * so we set type MPI_BYTE as a flag for a 0 byte message
>>> */
>>> #ifdef ZeroByteTypeBug
>>> if ( (m < 1) || (n < 1) )
>>> {
>>> *N = 0;
>>> return (MPI_BYTE);
>>> }
>>> #endif
>>> *N = 1;
>>> info=MPI_Type_vector(n, m, lda, Dtype, &GeType);
>>> info=MPI_Type_commit(&GeType);
>>>
>>> return(GeType);
>>> }
>>>
>>>
>>> ---------- Forwarded message ----------
>>> From: Brian Smith <smithbr at us.ibm.com>
>>> Date: Mon, Apr 23, 2012 at 8:32 AM
>>> Subject: Re: [td-support #113586] PAMI assertions
>>> To: Jeff Hammond <jhammond at alcf.anl.gov>
>>> Cc: jeff.science at gmail.com, Lee Killough <killough at alcf.anl.gov>,
>>> "td-support at alcf.anl.gov" <td-support at alcf.anl.gov>
>>>
>>>
>>> Well, there are 2 different bugs here.
>>>
>>> (from memory) 1) We found places were SCALAPACK made assumptions about
>>> uninitialized variables that caused significant badness in a number of
>>> apps. I believe someone reported this to the SCALAPACK maintainers
>>> many years ago. In fact, they ran something like valgrind and provided
>>> a patch for *all* of the usage of uninitialized variables. The
>>> SCALAPACK people did not integrate the changes at the time. Perhaps
>>> this new release will have some of them added so we don't have to deal
>>> with that again. See
>>> http://icl.cs.utk.edu/lapack-forum/viewtopic.php?f=2&t=588&p=1911&hilit=trsm#p1911
>>> and
>>> http://icl.cs.utk.edu/lapack-forum/viewtopic.php?f=13&t=2625
>>>
>>> (more recently, based on different glue punt-to-MPICH-logic) 2)
>>> SCALAPACK creates zero-length datatypes and then calls bcast with
>>> nonzero counts. The glue didn't test for this condition.
>>>
>>> A test for #2 in the MPICH2 test bucket wouldn't be a bad thing, but
>>> #1 is of course beyond the scope of MPICH2. It is possible there is a
>>> test for #2 already but because of other circumstances (node count for
>>> example) we might not have seen a failure. I'm sure Dave G. or someone
>>> could comment on that.
>>>
>>>
>>>
>>> Brian Smith (smithbr at us.ibm.com)
>>> BlueGene MPI Development/
>>> Communications Team Lead
>>> IBM Master Inventor
>>> IBM Rochester
>>> Phone: 507 253 4717
>>>
>>>
>>>
>>>
>>> From: Jeff Hammond <jhammond at alcf.anl.gov>
>>> To: Brian Smith/Rochester/IBM at IBMUS
>>> Cc: Lee Killough <killough at alcf.anl.gov>,
>>> "td-support at alcf.anl.gov" <td-support at alcf.anl.gov>
>>> Date: 04/21/2012 08:51 PM
>>> Subject: Re: [td-support #113586] PAMI assertions
>>> Sent by: jeff.science at gmail.com
>>> ________________________________
>>>
>>>
>>>
>>> If this showed up on BGP and now on BGQ, why was it not added to the
>>> MPICH2 test suite 3+ years ago? This is a bug in MPICH2 according to
>>> the comments in ScaLAPACK and the fact that both BGP and BGQ suffered
>>> it despite forking vastly different code bases, right?
>>>
>>> I'm trying to write a standalone test for this, btw, but haven't been
>>> successful yet.
>>>
>>> Jeff
>>>
>>> On Sat, Apr 21, 2012 at 8:46 PM, Brian Smith <smithbr at us.ibm.com> wrote:
>>>> Hi Lee,
>>>>
>>>> It's not actually a user error, what SCALAPACK is doing is (probably, I
>>>> didn't look at it too much) valid MPI code. However, it is appears to be a
>>>> weird fringe case that none of the test cases that come with MPICH, nor the
>>>> gigantic Intel/ANL testbucket found.
>>>>
>>>> Basically, we were missing an if() check in the collectives glue to check
>>>> for nonzero counts of zero length datatypes. The optimized protocols don't
>>>> deal with things like that which is why there was an assert().
>>>>
>>>>
>>>>
>>>> Brian Smith (smithbr at us.ibm.com)
>>>> BlueGene MPI Development/
>>>> Communications Team Lead
>>>> IBM Master Inventor
>>>> IBM Rochester
>>>> Phone: 507 253 4717
>>>>
>>>> "the scientific community is very A-Buzz with positive reviews of Blue Gene
>>>> ..." - Charles Archer - un-sung hero of technology
>>>>
>>>>
>>>>
>>>>
>>>> From: Lee Killough <killough at alcf.anl.gov>
>>>> To: Jeff Hammond <jhammond at alcf.anl.gov>
>>>> Cc: "td-support at alcf.anl.gov" <td-support at alcf.anl.gov>, Brian
>>>> Smith/Rochester/IBM at IBMUS
>>>> Date: 04/20/2012 11:08 PM
>>>> Subject: Re: [td-support #113586] PAMI assertions
>>>> ________________________________
>>>>
>>>>
>>>>
>>>> Sorry, a busy evening after 6 pm, fast forwarding to this email, and have
>>>> not read previous.
>>>>
>>>> First, if it's a user error, it should never be diagnosed in an assert().
>>>> assert() is only intended for catching internal errors, and should be turned
>>>> off in production code. It being an assert() immediately threw me off and
>>>> made me think it was a configuration issue or mismatched libraries, etc.
>>>>
>>>> A new version of ScaLAPACK is about to be released, maybe even as I send
>>>> this email. I have been working closely with the developers for the past
>>>> month on several bugs, some of which are only seen on BG, such as illegal
>>>> Fortran calls with overlapping arguments.
>>>>
>>>> If we can identify the cause and work on a fix for this bcast problem, I may
>>>> be able to get it in before the next release, or maybe not. If you have a
>>>> BLACS code fix suggestion, please send it and I'll try to get the fix in the
>>>> next version of ScaLAPACK.
>>>>
>>>> Thanks,
>>>> Lee
>>>>
>>>> On Apr 20, 2012, at 22:44, Jeff Hammond <jhammond at alcf.anl.gov> wrote:
>>>>
>>>>> IBM says it's a ScaLAPACK problem but that the latest MPI/PAMI has a
>>>>> fix anyways.
>>>>>
>>>>> See if this makes sense from the BLACS code. We can look through MPI
>>>>> standard together next week to see if BLACS violates it. It would be
>>>>> the first time Clint Whaley completely screwed up using MPI.
>>>>>
>>>>> Jeff
>>>>>
>>>>> ---------- Forwarded message ----------
>>>>> From: Brian Smith <smithbr at us.ibm.com>
>>>>> Date: Fri, Apr 20, 2012 at 8:47 PM
>>>>> Subject: Re: Fwd: [td-support #113586] PAMI assertions
>>>>> To: Jeff Hammond <jhammond at alcf.anl.gov>
>>>>> Cc: jeff.science at gmail.com, Michael Blocksome <blocksom at us.ibm.com>
>>>>>
>>>>> It's goofy datatype stuff in SCALAPACK. There's a fix in head... I
>>>>> didn't/don't feel it was worth efixing.
>>>>>
>>>>> I forget if the problem was a nonzero count with a zero-byte
>>>>> constructed datatype or a zero count with a non-zero byte constructed
>>>>> datatype, something stupid like that, so it's unlikely a real
>>>>> application is going to hit it.
>>>>>
>>>>> Brian Smith (smithbr at us.ibm.com)
>>>>> BlueGene MPI Development/
>>>>> Communications Team Lead
>>>>> IBM Master Inventor
>>>>> IBM Rochester
>>>>> Phone: 507 253 4717
>>>>>
>>>>>
>>>>>
>>>>> On Fri, Apr 20, 2012 at 7:34 PM, Jeff Hammond <jhammond at alcf.anl.gov>
>>>>> wrote:
>>>>>> 0-byte bcast is fine with the MPI I always use.
>>>>>>
>>>>>> Can you print the args at
>>>>>> /gpfs/veas-fs0/killough/libs/build/SCALAPACK-xl/BLACS/SRC/dgebr2d_.c:127
>>>>>> and see what I need to test?
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> Jeff
>>>>>>
>>>>>> On Fri, Apr 20, 2012 at 7:18 PM, Jeff Hammond <jhammond at alcf.anl.gov>
>>>>>> wrote:
>>>>>>> Assuming I am looking at the same code (PAMI Git head is V1R1M1
>>>>>>> already and I'm too Git-impaired to toggle for V1R1M0, nor do I want
>>>>>>> to do this in any case), the assertion that fails indicates a 0-byte
>>>>>>> message is being attempted.
>>>>>>>
>>>>>>> I'll write a test of 0-byte MPI_Bcast right now. Which MPI library
>>>>>>> are you linking against?
>>>>>>>
>>>>>>> Jeff
>>>>>>>
>>>>>>> =================================================
>>>>>>> pami_result_t postShortCollective (uint32_t opcode,
>>>>>>> uint32_t sizeoftype,
>>>>>>> uint32_t bytes,
>>>>>>> char * src,
>>>>>>> PipeWorkQueue * dpwq,
>>>>>>> pami_event_function
>>>>>>> cb_done,
>>>>>>> void * cookie,
>>>>>>> unsigned classroute)
>>>>>>> {
>>>>>>> TRACE_FN_ENTER();
>>>>>>> TRACE_FORMAT("opcode %u, sizeoftype %u, bytes %u, src %p,
>>>>>>> dpwq %p, classroute %u", opcode, sizeoftype, bytes, src, dpwq,
>>>>>>> classroute);
>>>>>>> PAMI_assert (bytes <= _collstate._tempSize);
>>>>>>> PAMI_assert(bytes); /*
>>>>>>> <------------------------------------------------------------------
>>>>>>> JEFF: This is line 284 */
>>>>>>> _int64Cpy(_collstate._tempBuf, src, bytes);
>>>>>>> //memcpy(_collstate._tempBuf, src, bytes);
>>>>>>> ...
>>>>>>> =================================================
>>>>>>>
>>>>>>> On Fri, Apr 20, 2012 at 6:06 PM, Lee Killough <killough at alcf.anl.gov>
>>>>>>> wrote:
>>>>>>>> With the new GA driver, I'm getting a lot of PAMI assertions when
>>>>>>>> running
>>>>>>>> ScaLAPACK programs. The traceback:
>>>>>>>>
>>>>>>>>
>>>>>>>> ------------------------------------------------------------------------
>>>>>>>> Program : ./xzsep
>>>>>>>>
>>>>>>>> ------------------------------------------------------------------------
>>>>>>>> +++ID Rank: 0, TGID: 1, Core: 0, HWTID:0 TID: 1 State: RUN
>>>>>>>>
>>>>>>>> 00000000016a3638
>>>>>>>> abort
>>>>>>>>
>>>>>>>> /bgsys/drivers/V1R1M0/ppc64/toolchain/gnu/glibc-2.12.2/stdlib/abort.c:77
>>>>>>>>
>>>>>>>> 000000000169c668
>>>>>>>> __assert_fail
>>>>>>>>
>>>>>>>> /bgsys/drivers/V1R1M0/ppc64/toolchain/gnu/glibc-2.12.2/assert/assert.c:81
>>>>>>>>
>>>>>>>> 000000000149774c
>>>>>>>> PAMI::Device::MU::CollectiveDmaModelBase::postShortCollective(unsigned
>>>>>>>> int, unsigned int, unsigned int, char*, PAMI::PipeWorkQueue*, void
>>>>>>>> (*)(void*, void*, pami_result_t), void*, unsigned int)
>>>>>>>>
>>>>>>>> /bgsys/source/srcV1R1M0.5670/comm/sys/buildtools/pami/components/devices/bgq/mu2/model/CollectiveDmaModelBase.h:284
>>>>>>>>
>>>>>>>> 0000000001497954
>>>>>>>>
>>>>>>>> PAMI::Device::MU::CollectiveMulticastDmaModel::postMulticastImmediate_impl(unsigned
>>>>>>>> long, unsigned long, pami_multicast_t*, void*)
>>>>>>>>
>>>>>>>> /bgsys/source/srcV1R1M0.5670/comm/sys/buildtools/pami/components/devices/bgq/mu2/model/CollectiveMulticastDmaModel.h:107
>>>>>>>>
>>>>>>>> 0000000001394304
>>>>>>>>
>>>>>>>> PAMI::Geometry::Algorithm<PAMI::Geometry::Common>::generate(pami_xfer_t*)
>>>>>>>>
>>>>>>>> /bgsys/source/srcV1R1M0.5670/comm/sys/buildtools/pami/algorithms/geometry/Algorithm.h:45
>>>>>>>>
>>>>>>>> 0000000001359dec
>>>>>>>> MPIDO_Bcast
>>>>>>>>
>>>>>>>> /bgsys/source/srcV1R1M0.5670/comm/lib/dev/mpich2/src/mpid/pamid/src/coll/bcast/mpido_bcast.c:146
>>>>>>>>
>>>>>>>> 00000000012f95cc
>>>>>>>> MPIR_Bcast_impl
>>>>>>>>
>>>>>>>> /bgsys/source/srcV1R1M0.5670/comm/lib/dev/mpich2/src/mpi/coll/bcast.c:1310
>>>>>>>>
>>>>>>>> 00000000012f997c
>>>>>>>> PMPI_Bcast
>>>>>>>>
>>>>>>>> /bgsys/source/srcV1R1M0.5670/comm/lib/dev/mpich2/src/mpi/coll/bcast.c:1464
>>>>>>>>
>>>>>>>> 000000000101b980
>>>>>>>> dgebr2d
>>>>>>>>
>>>>>>>> /gpfs/veas-fs0/killough/libs/build/SCALAPACK-xl/BLACS/SRC/dgebr2d_.c:127
>>>>>>>>
>>>>>>>> 00000000010670f4
>>>>>>>> pdlared1d
>>>>>>>> /gpfs/veas-fs0/killough/libs/build/SCALAPACK-xl/SRC/pdlared1d.f:156
>>>>>>>>
>>>>>>>> 0000000001045350
>>>>>>>> pzheevx
>>>>>>>> /gpfs/veas-fs0/killough/libs/build/SCALAPACK-xl/SRC/pzheevx.f:839
>>>>>>>>
>>>>>>>> 00000000010084e0
>>>>>>>> pzsepsubtst
>>>>>>>>
>>>>>>>> /gpfs/veas-fs0/killough/libs/build/SCALAPACK-xl/TESTING/EIG/pzsepsubtst.f:396
>>>>>>>>
>>>>>>>> 0000000001002874
>>>>>>>> pzseptst
>>>>>>>>
>>>>>>>> /gpfs/veas-fs0/killough/libs/build/SCALAPACK-xl/TESTING/EIG/pzseptst.f:565
>>>>>>>>
>>>>>>>> 00000000010123ec
>>>>>>>> pzsepreq
>>>>>>>>
>>>>>>>> /gpfs/veas-fs0/killough/libs/build/SCALAPACK-xl/TESTING/EIG/pzsepreq.f:205
>>>>>>>>
>>>>>>>> 00000000010112e8
>>>>>>>> pzsepdriver
>>>>>>>>
>>>>>>>> /gpfs/veas-fs0/killough/libs/build/SCALAPACK-xl/TESTING/EIG/pzsepdriver.f:229
>>>>>>>>
>>>>>>>> 0000000001699b08
>>>>>>>> generic_start_main
>>>>>>>>
>>>>>>>> /bgsys/drivers/V1R1M0/ppc64/toolchain/gnu/glibc-2.12.2/csu/../csu/libc-start.c:226
>>>>>>>>
>>>>>>>> 0000000001699e04
>>>>>>>> __libc_start_main
>>>>>>>>
>>>>>>>> /bgsys/drivers/V1R1M0/ppc64/toolchain/gnu/glibc-2.12.2/csu/../sysdeps/unix/sysv/linux/powerpc/libc-start.c:194
>>>>>>>>
>>>>>>>> 0000000000000000
>>>>>>>> ??
>>>>>>>> ??:0
>>>>>>>>
>>>>>>>> It looks like an assertion is failing at:
>>>>>>>>
>>>>>>>> PAMI::Device::MU::CollectiveDmaModelBase::postShortCollective(unsigned
>>>>>>>> int, unsigned int, unsigned int, char*, PAMI::PipeWorkQueue*, void
>>>>>>>> (*)(void*, void*, pami_result_t), void*, unsigned int)
>>>>>>>>
>>>>>>>>
>>>>>>>> /bgsys/source/srcV1R1M0.5670/comm/sys/buildtools/pami/components/devices/bgq/mu2/model/CollectiveDmaModelBase.h:284
>>>>>>>>
>>>>>>>> during a broadcast.
>>>>>>>>
>>>>>>>> I don't recall these errors in the previous driver.
>>>>>>>>
>>>>>>>> Lee
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Jeff Hammond
>>>>>>> Argonne Leadership Computing Facility
>>>>>>> University of Chicago Computation Institute
>>>>>>> jhammond at alcf.anl.gov / (630) 252-5381
>>>>>>> http://www.linkedin.com/in/jeffhammond
>>>>>>> https://wiki.alcf.anl.gov/parts/index.php/User:Jhammond (in-progress)
>>>>>>> https://wiki.alcf.anl.gov/old/index.php/User:Jhammond (deprecated)
>>>>>>> https://wiki-old.alcf.anl.gov/index.php/User:Jhammond(deprecated)
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Jeff Hammond
>>>>>> Argonne Leadership Computing Facility
>>>>>> University of Chicago Computation Institute
>>>>>> jhammond at alcf.anl.gov / (630) 252-5381
>>>>>> http://www.linkedin.com/in/jeffhammond
>>>>>> https://wiki.alcf.anl.gov/parts/index.php/User:Jhammond (in-progress)
>>>>>> https://wiki.alcf.anl.gov/old/index.php/User:Jhammond (deprecated)
>>>>>> https://wiki-old.alcf.anl.gov/index.php/User:Jhammond(deprecated)
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Jeff Hammond
>>>>> Argonne Leadership Computing Facility
>>>>> University of Chicago Computation Institute
>>>>> jhammond at alcf.anl.gov / (630) 252-5381
>>>>> http://www.linkedin.com/in/jeffhammond
>>>>> https://wiki.alcf.anl.gov/parts/index.php/User:Jhammond (in-progress)
>>>>> https://wiki.alcf.anl.gov/old/index.php/User:Jhammond (deprecated)
>>>>> https://wiki-old.alcf.anl.gov/index.php/User:Jhammond(deprecated)
>>>>
>>>>
>>>
>>>
>>>
>>> --
>>> Jeff Hammond
>>> Argonne Leadership Computing Facility
>>> University of Chicago Computation Institute
>>> jhammond at alcf.anl.gov / (630) 252-5381
>>> http://www.linkedin.com/in/jeffhammond
>>> https://wiki.alcf.anl.gov/parts/index.php/User:Jhammond (in-progress)
>>> https://wiki.alcf.anl.gov/old/index.php/User:Jhammond (deprecated)
>>> https://wiki-old.alcf.anl.gov/index.php/User:Jhammond(deprecated)
>>>
>>>
>>>
>>>
>>> --
>>> Jeff Hammond
>>> Argonne Leadership Computing Facility
>>> University of Chicago Computation Institute
>>> jhammond at alcf.anl.gov / (630) 252-5381
>>> http://www.linkedin.com/in/jeffhammond
>>> https://wiki.alcf.anl.gov/parts/index.php/User:Jhammond (in-progress)
>>> https://wiki.alcf.anl.gov/old/index.php/User:Jhammond (deprecated)
>>> https://wiki-old.alcf.anl.gov/index.php/User:Jhammond(deprecated)
>>
>
>
>
> --
> Jeff Hammond
> Argonne Leadership Computing Facility
> University of Chicago Computation Institute
> jhammond at alcf.anl.gov / (630) 252-5381
> http://www.linkedin.com/in/jeffhammond
> https://wiki.alcf.anl.gov/parts/index.php/User:Jhammond (in-progress)
> https://wiki.alcf.anl.gov/old/index.php/User:Jhammond (deprecated)
> https://wiki-old.alcf.anl.gov/index.php/User:Jhammond(deprecated)
More information about the mpich2-dev
mailing list