[mpich-discuss] [mpich2-announce] Announcing the availability of MPICH2-1.1b1

Dave Goodell goodell at mcs.anl.gov
Mon Mar 30 15:53:38 CDT 2009


Hi Joe,

Thanks for the feedback.  I've included my comments inline below.

On Mar 30, 2009, at 12:59 PM, Joe Ratterman wrote:

> Dave and Pavan,
>
> 1)
> The most subtle problem I found related to the use of  
> MPID_Segment_pack by MPIR_Localcopy in src/mpi/coll/helper_fns.c.   
> It passes the variable "last" of type "MPIDI_msg_sz_t" as "&last" to  
> the pack/unpack functions, which are expecting an MPI_Aint*.  On our  
> system, MPIDI_msg_sz_t is an unsigned, and MPI_Aint is a long long.   
> This means that MPID_Segment_pack clobbers too much memory.  To fix  
> this, I changed "last" to be an MPI_Aint, and it works fine.  I  
> think that means that
> A)  "last" should be an MPI_Aint, or
> B)  Segment_*pack() should take a MPIDI_msg_sz_t*, or
> C)  MPIDI_msg_sz_t should be the same as MPI_Aint.
>
> I don't really know what the answer is.  However, I caught this  
> thanks to a compiler warning, and our other uses of MPIDI_msg_sz_t  
> do not generate warnings.  I think that means that MPIDI_msg_sz_t is  
> the correct size.

It looks like solution A is probably the right one... we use MPI_Aint  
on for the "last" declaration on line 271, but not in the other two  
declarations for some reason.  I'll file a bug for this.

> 2)
> I'm still having issues with "MPIU_Find_local_and_external".  I am  
> getting the fall-back version (judging from the line numbers), but  
> it prints a lot of these messages: stderr[2]: Internal Error:  
> invalid error code 1312d10 (Ring Index out of range) in  
> MPIU_Find_local_and_external:213
> stderr[0]: Internal Error: invalid error code 1312d10 (Ring Index  
> out of range) in MPIU_Find_local_and_external:213
> stderr[3]: Internal Error: invalid error code 1312d10 (Ring Index  
> out of range) in MPIU_Find_local_and_external:213
> stderr[1]: Internal Error: invalid error code 1312d10 (Ring Index  
> out of range) in MPIU_Find_local_and_external:213
>
> Changing the whole of that function to "return MPI_ERR_UNKNOWN;"  
> still seems to work fine.  Is there a way to be sure that the  
> "MPIU_ERR_SETANDJUMP" doesn't print anything?

That's strange, I'm not sure why MPIU_ERR_SETANDJUMP is error-ing  
internally.  "**notimpl" should be a valid error code [1].  However, I  
can reproduce this when I force CH3 to undef MPID_USE_NODE_IDS in  
mpidpre.h, so it's not something BG-specific.  It's probably a quirk  
in the usage of the error handling macros.  I'll take a closer look  
soon to figure out what's really going on here.

-Dave

[1] https://trac.mcs.anl.gov/projects/mpich2/browser/mpich2/tags/release/mpich2-1.1b1/src/mpi/errhan/errnames.txt 
#L17



More information about the mpich-discuss mailing list