[mpich-discuss] [mpich2-announce] Announcing the availability of MPICH2-1.1b1
Dave Goodell
goodell at mcs.anl.gov
Mon Mar 30 15:53:38 CDT 2009
Hi Joe,
Thanks for the feedback. I've included my comments inline below.
On Mar 30, 2009, at 12:59 PM, Joe Ratterman wrote:
> Dave and Pavan,
>
> 1)
> The most subtle problem I found related to the use of
> MPID_Segment_pack by MPIR_Localcopy in src/mpi/coll/helper_fns.c.
> It passes the variable "last" of type "MPIDI_msg_sz_t" as "&last" to
> the pack/unpack functions, which are expecting an MPI_Aint*. On our
> system, MPIDI_msg_sz_t is an unsigned, and MPI_Aint is a long long.
> This means that MPID_Segment_pack clobbers too much memory. To fix
> this, I changed "last" to be an MPI_Aint, and it works fine. I
> think that means that
> A) "last" should be an MPI_Aint, or
> B) Segment_*pack() should take a MPIDI_msg_sz_t*, or
> C) MPIDI_msg_sz_t should be the same as MPI_Aint.
>
> I don't really know what the answer is. However, I caught this
> thanks to a compiler warning, and our other uses of MPIDI_msg_sz_t
> do not generate warnings. I think that means that MPIDI_msg_sz_t is
> the correct size.
It looks like solution A is probably the right one... we use MPI_Aint
on for the "last" declaration on line 271, but not in the other two
declarations for some reason. I'll file a bug for this.
> 2)
> I'm still having issues with "MPIU_Find_local_and_external". I am
> getting the fall-back version (judging from the line numbers), but
> it prints a lot of these messages: stderr[2]: Internal Error:
> invalid error code 1312d10 (Ring Index out of range) in
> MPIU_Find_local_and_external:213
> stderr[0]: Internal Error: invalid error code 1312d10 (Ring Index
> out of range) in MPIU_Find_local_and_external:213
> stderr[3]: Internal Error: invalid error code 1312d10 (Ring Index
> out of range) in MPIU_Find_local_and_external:213
> stderr[1]: Internal Error: invalid error code 1312d10 (Ring Index
> out of range) in MPIU_Find_local_and_external:213
>
> Changing the whole of that function to "return MPI_ERR_UNKNOWN;"
> still seems to work fine. Is there a way to be sure that the
> "MPIU_ERR_SETANDJUMP" doesn't print anything?
That's strange, I'm not sure why MPIU_ERR_SETANDJUMP is error-ing
internally. "**notimpl" should be a valid error code [1]. However, I
can reproduce this when I force CH3 to undef MPID_USE_NODE_IDS in
mpidpre.h, so it's not something BG-specific. It's probably a quirk
in the usage of the error handling macros. I'll take a closer look
soon to figure out what's really going on here.
-Dave
[1] https://trac.mcs.anl.gov/projects/mpich2/browser/mpich2/tags/release/mpich2-1.1b1/src/mpi/errhan/errnames.txt
#L17
More information about the mpich-discuss
mailing list