[MPICH] build mpich2 with Myrinet GM

Darius Buntinas buntinas at mcs.anl.gov
Thu Feb 28 13:13:47 CST 2008


Sorry, that last patch was against a different version.  Can you try 
this patch?  You might get some warnings about changes having already 
been applied since the previous patch already made some of the changes. 
  You can ignore those.

-d


On 02/28/2008 11:38 AM, Wei-keng Liao wrote:
> I got an error when I applied the patch:
> mercury::mpich2-1.0.6p1(11:34am) #448% patch -p0 < gm.patch
> patching file src/mpid/ch3/include/mpidpre.h
> patching file 
> src/mpid/ch3/channels/nemesis/nemesis/net_mod/gm_module/gm_module_impl.h
> Hunk #1 succeeded at 51 (offset -1 lines).
> patching file 
> src/mpid/ch3/channels/nemesis/nemesis/net_mod/gm_module/gm_module_poll.c
> patching file 
> src/mpid/ch3/channels/nemesis/nemesis/net_mod/gm_module/gm_module_send.c
> Hunk #2 FAILED at 233.
> Hunk #3 succeeded at 265 (offset -80 lines).
> Hunk #4 succeeded at 343 (offset -81 lines).
> 1 out of 4 hunks FAILED -- saving rejects to file 
> src/mpid/ch3/channels/nemesis/nemesis/net_mod/gm_module/gm_module_send.c.rej
> 
> mercury::mpich2-1.0.6p1(11:36am) #450% cat 
> src/mpid/ch3/channels/nemesis/nemesis/net_mod/gm_module/gm_module_send.c.rej
> ***************
> *** 237,243 ****
>   {
>       int mpi_errno = MPI_SUCCESS;
>       char *dataptr;
> -     int datalen;
>       int complete;  
>   
>       while (active_send || !SEND_Q_EMPTY())
> --- 233,239 ----
>   {
>       int mpi_errno = MPI_SUCCESS;
>       char *dataptr;
> +     MPIDI_msg_sz_t datalen;
>       int complete;  
>   
>       while (active_send || !SEND_Q_EMPTY())
> 
> Wei-keng
> 
> 
> On Thu, 28 Feb 2008, Darius Buntinas wrote:
> 
>> Thanks for reporting this.  Here's a patch that should fix it.  Let me know if
>> you have any more trouble.
>>
>> Thanks,
>> -d
>>
>> On 02/27/2008 10:18 PM, Wei-keng Liao wrote:
>>> OK. the patch fixed the problem and I was able to build the mpich. But when
>>> I ran the test alltoallv in test/mpi/coll using 4 processes, it failed with
>>> error message:
>>>   rank 2 in job 1 tg-c527_40397 caused collective abort of all ranks
>>>   exit status of rank 2: killed by signal 9 
>>>
>>> The gdb on the coredump shows
>>> (gdb) where
>>> #0  0x20000000001c9120 in ?? ()
>>> #1  0x40000000000a71d0 in send_pkt ()
>>> #2  0x40000000000a6530 in MPID_nem_gm_iSendContig ()
>>> #3  0x40000000000ab000 in MPIDI_CH3_iSendv ()
>>> #4  0x400000000003e890 in MPIDI_CH3_EagerContigIsend ()
>>> #5  0x4000000000048160 in MPID_Isend ()
>>> #6  0x400000000000ec00 in MPIC_Isend ()
>>> #7  0x400000000000a8e0 in MPIR_Alltoallv ()
>>> #8  0x400000000000b2d0 in PMPI_Alltoallv ()
>>> #9  0x4000000000003670 in main ()
>>> #10 0x40000000000a71d0 in send_pkt ()
>>>
>>> Wei-keng
>>>
>>>
>>> On Wed, 27 Feb 2008, Darius Buntinas wrote:
>>>
>>>> Sorry about that.  I guess I didn't test this on an itanium after making
>>>> some changes there.
>>>>
>>>> I've attached a patch file that should fix this.  I'm still not sure why
>>>> it's not working with your intel compiler.
>>>>
>>>> Apply the patch like this (from the mpich2 source directory)
>>>>   patch -p0 < ia64_atomics.patch
>>>>
>>>> Then do a make clean and make.
>>>>
>>>> -d
>>>>
>>>> On 02/27/2008 11:33 AM, Wei-keng Liao wrote:
>>>>> I got a different error when I built mpich with gcc 3.2.2 at compiling
>>>>> file
>>>>> nemesis/src/mpid_nem_alloc.c. (I used ifort for FC environment
>>>>> variable.)
>>>>>
>>>>> In file included from ../include/mpid_nem_impl.h:13,
>>>>>                  from mpid_nem_alloc.c:7:
>>>>> ../include/mpid_nem_atomics.h: In function `MPID_NEM_SWAP':
>>>>> ../include/mpid_nem_atomics.h:27: warning: dereferencing `void *'
>>>>> pointer
>>>>> ../include/mpid_nem_atomics.h: In function `MPID_NEM_CAS':
>>>>> ../include/mpid_nem_atomics.h:54: warning: dereferencing `void *'
>>>>> pointer
>>>>> ../include/mpid_nem_atomics.h: In function `MPID_NEM_FETCH_AND_INC':
>>>>> ../include/mpid_nem_atomics.h:164: parse error before string constant
>>>>>
>>>>> Also, I tried Intel icc 8.1.037 and it failed with the message as icc
>>>>> 9.0.032 and 9.1.046.
>>>>>
>>>>> Wei-keng
>>>>>
>>>>>
>>>>> On Tue, 26 Feb 2008, Darius Buntinas wrote:
>>>>>> It looks like the icc compiler you're using doesn't like the gcc-style
>>>>>> inline
>>>>>> assembly code.
>>>>>>
>>>>>> What version of icc do you have?
>>>>>> Can you try compiling with gcc instead of icc?
>>>>>>
>>>>>> -d
>>>>>>
>>>>>> On 02/26/2008 12:32 PM, Wei-keng Liao wrote:
>>>>>>> Attached are 3 files:
>>>>>>>
>>>>>>> out.configure  -  stdout from configure
>>>>>>> out.make       -  stdout from make
>>>>>>> config.log
>>>>>>>
>>>>>>> Wei-keng
>>>>>>>
>>>>>>> On Tue, 26 Feb 2008, Darius Buntinas wrote:
>>>>>>>
>>>>>>>> Can you send us the output of configure as well as config.log?
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> -d
>>>>>>>>
>>>>>>>> On 02/26/2008 11:35 AM, Wei-keng Liao wrote:
>>>>>>>>> I got an error during make:
>>>>>>>>>
>>>>>>>>> ../include/mpid_nem_atomics.h(31): catastrophic error: #error
>>>>>>>>> directive:
>>>>>>>>> No
>>>>>>>>> swap function defined for this architecture
>>>>>>>>>   #error No swap function defined for this architecture
>>>>>>>>>    ^
>>>>>>>>> compilation aborted for mpid_nem_alloc.c (code 4)
>>>>>>>>>
>>>>>>>>> I am using configure options:
>>>>>>>>>           --with-device=ch3:nemesis:gm  \
>>>>>>>>>           --with-gm=/opt/gm \
>>>>>>>>>           --enable-f77 --enable-f90 --enable-cxx \
>>>>>>>>>           --enable-fast \
>>>>>>>>>           --enable-romio \
>>>>>>>>>           --without-mpe \
>>>>>>>>>           --with-file-system=ufs
>>>>>>>>>
>>>>>>>>> and the command "uname -a" on the machine is
>>>>>>>>> Linux tg-login4 2.4.21-309.tg1 #1 SMP Thu Jun 1 17:07:28 CDT
>>>>>>>>> 2006
>>>>>>>>> ia64
>>>>>>>>> unknown
>>>>>>>>>
>>>>>>>>> I am using Intel compiler v 9.1.043
>>>>>>>>>
>>>>>>>>> Wei-keng
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Tue, 26 Feb 2008, Darius Buntinas wrote:
>>>>>>>>>> On 02/26/2008 10:08 AM, Wei-keng Liao wrote:
>>>>>>>>>>> I have a few questions on build mpich2-1.0.6p1 with Myrinet
>>>>>>>>>>> GM
>>>>>>>>>>> library.
>>>>>>>>>>>
>>>>>>>>>>> On my target machine, the GM library (include, lib, bin,
>>>>>>>>>>> etc.)
>>>>>>>>>>> is in
>>>>>>>>>>> /opt/gm. According to MPICH README, I used the 2 options
>>>>>>>>>>> below
>>>>>>>>>>> when
>>>>>>>>>>> configuring: 
>>>>>>>>>>>     --with-device=ch3:nemesis:gm  and --with-gm=/opt/gm
>>>>>>>>>>>
>>>>>>>>>>> I can see both libgm.a and libgm.so are in /opt/gm/lib.
>>>>>>>>>>>
>>>>>>>>>>> Q1: Do I need other configure options or setting environment
>>>>>>>>>>> variables
>>>>>>>>>>>     (in addition to CC, FC, CXX, F90)? Should I set LDFLAGS
>>>>>>>>>>>     to
>>>>>>>>>>>     "-L/opt/gm/lib -lgm" ?
>>>>>>>>>> Nope, the --with-gm=/opt/gm should take care of all of that
>>>>>>>>>> for
>>>>>>>>>> you.
>>>>>>>>>>
>>>>>>>>>>> Q2: Since nemesis does not support MPI dynamic process
>>>>>>>>>>> routines
>>>>>>>>>>> yet
>>>>>>>>>>> and
>>>>>>>>>>> I 
>>>>>>>>>>>     need those routines, can I use --with-device=ch3:sock:gm
>>>>>>>>>>>     instead?
>>>>>>>>>> No, only nemesis supports gm.
>>>>>>>>>>
>>>>>>>>>>> Q3: Do I need anything else (source codes, library) from
>>>>>>>>>>> Myrinet
>>>>>>>>>>> to
>>>>>>>>>>> build 
>>>>>>>>>>>     mpich? Or the /opt/gm is good enough?
>>>>>>>>>> All you need is libgm.a and gm.h.
>>>>>>>>>>
>>>>>>>>>>> Q4: Once the mpich is built, is there a way to verify that
>>>>>>>>>>> GM is
>>>>>>>>>>> actually 
>>>>>>>>>>>     used?
>>>>>>>>>> Well, you should see a performance improvement over using
>>>>>>>>>> sockets.
>>>>>>>>>> Run a
>>>>>>>>>> ping-pong test; you should see latencies around 10us or less.
>>>>>>>>>>
>>>>>>>>>> -d
>>>>>>>>>>
>>
> 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: gm-b.patch
Type: text/x-patch
Size: 4509 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20080228/9de204b1/attachment.bin>


More information about the mpich-discuss mailing list