<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<meta http-equiv="content-type" content="text/html; charset=ISO-8859-1">
</head>
<body bgcolor="#ffffff" text="#000000">
Hi,<br>
<br>
I've recently compiled and installed mpich2-1.3rc2 an<font
color="#000000">d mvapich2</font><font color="#000000"> 1.5.1p1 wi</font>th
knem support enabled (using the options --with-device=ch3:nemesis
--with-nemesis-local-lmt=knem --with-knem=/usr/local/knem). The
version of knem that I use is 0.9.2<br>
<br>
Doing a cat of /dev/knem gives:<br>
<font face="Courier New, Courier, monospace"><small>knem 0.9.2<br>
Driver ABI=0xc<br>
Flags: forcing 0x0, ignoring 0x0<br>
DMAEngine: KernelSupported Enabled NoChannelAvailable<br>
Debug: NotBuilt<br>
Requests Submitted : 119406<br>
Requests Processed/DMA : 0<br>
Requests Processed/Thread : 0<br>
Requests Processed/PinLocal : 0<br>
Requests Failed/NoMemory : 0<br>
Requests Failed/ReadCmd : 0<br>
Requests Failed/FindRegion : 6<br>
Requests Failed/Pin : 0<br>
Requests Failed/MemcpyToUser: 0<br>
Requests Failed/MemcpyPinned: 0<br>
Requests Failed/DMACopy : 0<br>
Dmacpy Cleanup Timeout : 0</small></font><br>
<br>
I ran several tests using IMB and osu benchmarks. All tests look
fine (and I get good bandwidth results, comparable to what I could
get with limic2) except the osu_bibw test from the osu benchmarks
which throws the following error with mpich2:<br>
<br>
<small><font face="Courier New, Courier, monospace"># OSU MPI
Bi-Directional Bandwidth Test v3.1.2<br>
# Size Bi-Bandwidth (MB/s)<br>
1 3.41<br>
2 7.15<br>
4 12.06<br>
8 39.66<br>
16 73.20<br>
32 156.94<br>
64 266.58<br>
128 370.34<br>
256 977.24<br>
512 2089.85<br>
1024 3498.96<br>
2048 5543.29<br>
4096 7314.23<br>
8192 8381.86<br>
16384 9291.81<br>
32768 5948.53<br>
Fatal error in PMPI_Waitall: Other MPI error, error stack:<br>
PMPI_Waitall(274)...............: MPI_Waitall(count=64,
req_array=0xa11a20, status_array=0xe23960) failed<br>
MPIR_Waitall_impl(121)..........: <br>
MPIDI_CH3I_Progress(393)........: <br>
MPID_nem_handle_pkt(573)........: <br>
pkt_RTS_handler(241)............: <br>
do_cts(518).....................: <br>
MPID_nem_lmt_dma_start_recv(365): <br>
MPID_nem_lmt_send_COOKIE(173)...: ioctl failed errno=22 -
Invalid argument<br>
APPLICATION TERMINATED WITH THE EXIT STRING: Hangup (signal 1)</font></small><br>
<br>
It seems to come from the nemesis source and from the
mpid_nem_lmt_dma.c file which uses knem but I don't really now what
happens and I don't see anything special in that test which measures
the bi-directional bandwidth. On another machine, I get the
following error with mvapich2:<br>
<br>
<small><font face="Courier New, Courier, monospace"># OSU MPI
Bi-Directional Bandwidth Test v3.1.2<br>
# Size Bi-Bandwidth (MB/s)<br>
1 1.92<br>
2 3.86<br>
4 7.72<br>
8 15.44<br>
16 30.75<br>
32 61.44<br>
64 122.54<br>
128 232.62<br>
256 416.85<br>
512 718.60<br>
1024 1148.63<br>
2048 1462.37<br>
4096 1659.45<br>
8192 2305.22<br>
16384 3153.85<br>
32768 3355.30<br>
APPLICATION TERMINATED WITH THE EXIT STRING: Terminated (signal
15)</font></small><br>
<br>
Attaching gdb gives the following:<br>
<small><font face="Courier New, Courier, monospace">Program received
signal SIGSEGV, Segmentation fault.<br>
0x00007f12691e4c99 in MPID_nem_lmt_dma_progress ()<br>
at
/project/csvis/soumagne/apps/src/eiger/mvapich2-1.5.1p1/src/mpid/ch3/channels/nemesis/nemesis/src/mpid_nem_lmt_dma.c:484<br>
484 prev->next = cur->next;</font></small><br>
<br>
Is there something wrong in my mpich2/knem configuration or does
anyone know where does this problem come from? (the osu_bibw.c file
is attached)<br>
<br>
Thanks in advance<br>
<br>
Jerome<br>
<br>
<pre class="moz-signature" cols="72">--
Jérôme Soumagne
Scientific Computing Research Group
CSCS, Swiss National Supercomputing Centre
Galleria 2, Via Cantonale | Tel: +41 (0)91 610 8258
CH-6928 Manno, Switzerland | Fax: +41 (0)91 610 8282</pre>
<br>
<br>
<br>
</body>
</html>