[MPICH] build mpich2 with Myrinet GM

Darius Buntinas buntinas at mcs.anl.gov
Thu Feb 28 11:25:08 CST 2008


Thanks for reporting this.  Here's a patch that should fix it.  Let me 
know if you have any more trouble.

Thanks,
-d

On 02/27/2008 10:18 PM, Wei-keng Liao wrote:
> OK. the patch fixed the problem and I was able to build the mpich. But 
> when I ran the test alltoallv in test/mpi/coll using 4 processes, it 
> failed with error message:
>   rank 2 in job 1 tg-c527_40397 caused collective abort 
>   of all ranks
>   exit status of rank 2: killed by signal 9 
> 
> The gdb on the coredump shows
> (gdb) where
> #0  0x20000000001c9120 in ?? ()
> #1  0x40000000000a71d0 in send_pkt ()
> #2  0x40000000000a6530 in MPID_nem_gm_iSendContig ()
> #3  0x40000000000ab000 in MPIDI_CH3_iSendv ()
> #4  0x400000000003e890 in MPIDI_CH3_EagerContigIsend ()
> #5  0x4000000000048160 in MPID_Isend ()
> #6  0x400000000000ec00 in MPIC_Isend ()
> #7  0x400000000000a8e0 in MPIR_Alltoallv ()
> #8  0x400000000000b2d0 in PMPI_Alltoallv ()
> #9  0x4000000000003670 in main ()
> #10 0x40000000000a71d0 in send_pkt ()
> 
> Wei-keng
> 
> 
> On Wed, 27 Feb 2008, Darius Buntinas wrote:
> 
>> Sorry about that.  I guess I didn't test this on an itanium after making 
>> some changes there.
>>
>> I've attached a patch file that should fix this.  I'm still not sure why 
>> it's not working with your intel compiler.
>>
>> Apply the patch like this (from the mpich2 source directory)
>>   patch -p0 < ia64_atomics.patch
>>
>> Then do a make clean and make.
>>
>> -d
>>
>> On 02/27/2008 11:33 AM, Wei-keng Liao wrote:
>>> I got a different error when I built mpich with gcc 3.2.2 at compiling file
>>> nemesis/src/mpid_nem_alloc.c. (I used ifort for FC environment variable.)
>>>
>>> In file included from ../include/mpid_nem_impl.h:13,
>>>                  from mpid_nem_alloc.c:7:
>>> ../include/mpid_nem_atomics.h: In function `MPID_NEM_SWAP':
>>> ../include/mpid_nem_atomics.h:27: warning: dereferencing `void *' pointer
>>> ../include/mpid_nem_atomics.h: In function `MPID_NEM_CAS':
>>> ../include/mpid_nem_atomics.h:54: warning: dereferencing `void *' pointer
>>> ../include/mpid_nem_atomics.h: In function `MPID_NEM_FETCH_AND_INC':
>>> ../include/mpid_nem_atomics.h:164: parse error before string constant
>>>
>>> Also, I tried Intel icc 8.1.037 and it failed with the message as icc
>>> 9.0.032 and 9.1.046.
>>>
>>> Wei-keng
>>>
>>>
>>> On Tue, 26 Feb 2008, Darius Buntinas wrote:
>>>> It looks like the icc compiler you're using doesn't like the gcc-style
>>>> inline
>>>> assembly code.
>>>>
>>>> What version of icc do you have?
>>>> Can you try compiling with gcc instead of icc?
>>>>
>>>> -d
>>>>
>>>> On 02/26/2008 12:32 PM, Wei-keng Liao wrote:
>>>>> Attached are 3 files:
>>>>>
>>>>> out.configure  -  stdout from configure
>>>>> out.make       -  stdout from make
>>>>> config.log
>>>>>
>>>>> Wei-keng
>>>>>
>>>>> On Tue, 26 Feb 2008, Darius Buntinas wrote:
>>>>>
>>>>>> Can you send us the output of configure as well as config.log?
>>>>>>
>>>>>> Thanks,
>>>>>> -d
>>>>>>
>>>>>> On 02/26/2008 11:35 AM, Wei-keng Liao wrote:
>>>>>>> I got an error during make:
>>>>>>>
>>>>>>> ../include/mpid_nem_atomics.h(31): catastrophic error: #error
>>>>>>> directive:
>>>>>>> No
>>>>>>> swap function defined for this architecture
>>>>>>>   #error No swap function defined for this architecture
>>>>>>>    ^
>>>>>>> compilation aborted for mpid_nem_alloc.c (code 4)
>>>>>>>
>>>>>>> I am using configure options:
>>>>>>>           --with-device=ch3:nemesis:gm  \
>>>>>>>           --with-gm=/opt/gm \
>>>>>>>           --enable-f77 --enable-f90 --enable-cxx \
>>>>>>>           --enable-fast \
>>>>>>>           --enable-romio \
>>>>>>>           --without-mpe \
>>>>>>>           --with-file-system=ufs
>>>>>>>
>>>>>>> and the command "uname -a" on the machine is
>>>>>>> Linux tg-login4 2.4.21-309.tg1 #1 SMP Thu Jun 1 17:07:28 CDT 2006
>>>>>>> ia64
>>>>>>> unknown
>>>>>>>
>>>>>>> I am using Intel compiler v 9.1.043
>>>>>>>
>>>>>>> Wei-keng
>>>>>>>
>>>>>>>
>>>>>>> On Tue, 26 Feb 2008, Darius Buntinas wrote:
>>>>>>>> On 02/26/2008 10:08 AM, Wei-keng Liao wrote:
>>>>>>>>> I have a few questions on build mpich2-1.0.6p1 with Myrinet GM
>>>>>>>>> library.
>>>>>>>>>
>>>>>>>>> On my target machine, the GM library (include, lib, bin, etc.)
>>>>>>>>> is in
>>>>>>>>> /opt/gm. According to MPICH README, I used the 2 options below
>>>>>>>>> when
>>>>>>>>> configuring: 
>>>>>>>>>     --with-device=ch3:nemesis:gm  and --with-gm=/opt/gm
>>>>>>>>>
>>>>>>>>> I can see both libgm.a and libgm.so are in /opt/gm/lib.
>>>>>>>>>
>>>>>>>>> Q1: Do I need other configure options or setting environment
>>>>>>>>> variables
>>>>>>>>>     (in addition to CC, FC, CXX, F90)? Should I set LDFLAGS to
>>>>>>>>>     "-L/opt/gm/lib -lgm" ?
>>>>>>>> Nope, the --with-gm=/opt/gm should take care of all of that for
>>>>>>>> you.
>>>>>>>>
>>>>>>>>> Q2: Since nemesis does not support MPI dynamic process routines
>>>>>>>>> yet
>>>>>>>>> and
>>>>>>>>> I 
>>>>>>>>>     need those routines, can I use --with-device=ch3:sock:gm
>>>>>>>>>     instead?
>>>>>>>> No, only nemesis supports gm.
>>>>>>>>
>>>>>>>>> Q3: Do I need anything else (source codes, library) from Myrinet
>>>>>>>>> to
>>>>>>>>> build 
>>>>>>>>>     mpich? Or the /opt/gm is good enough?
>>>>>>>> All you need is libgm.a and gm.h.
>>>>>>>>
>>>>>>>>> Q4: Once the mpich is built, is there a way to verify that GM is
>>>>>>>>> actually 
>>>>>>>>>     used?
>>>>>>>> Well, you should see a performance improvement over using sockets.
>>>>>>>> Run a
>>>>>>>> ping-pong test; you should see latencies around 10us or less.
>>>>>>>>
>>>>>>>> -d
>>>>>>>>
>>
> 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: gm.patch
Type: text/x-patch
Size: 4465 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20080228/8b08c2e5/attachment.bin>


More information about the mpich-discuss mailing list