<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<META content="text/html; charset=us-ascii" http-equiv=Content-Type>
<META name=GENERATOR content="MSHTML 8.00.6001.18812"></HEAD>
<BODY>
<DIV dir=ltr align=left><SPAN class=609020122-28092009><FONT color=#0000ff
size=2 face=Arial>ch3:sock won't perform as well as ch3:nemesis
though.</FONT></SPAN></DIV>
<DIV dir=ltr align=left><SPAN class=609020122-28092009><FONT color=#0000ff
size=2 face=Arial></FONT></SPAN> </DIV>
<DIV dir=ltr align=left><SPAN class=609020122-28092009><FONT color=#0000ff
size=2 face=Arial>Rajeev</FONT></SPAN></DIV><BR>
<BLOCKQUOTE
style="BORDER-LEFT: #0000ff 2px solid; PADDING-LEFT: 5px; MARGIN-LEFT: 5px; MARGIN-RIGHT: 0px">
<DIV dir=ltr lang=en-us class=OutlookMessageHeader align=left>
<HR tabIndex=-1>
<FONT size=2 face=Tahoma><B>From:</B> mpich-discuss-bounces@mcs.anl.gov
[mailto:mpich-discuss-bounces@mcs.anl.gov] <B>On Behalf Of </B>Cye
Stoner<BR><B>Sent:</B> Monday, September 28, 2009 4:32 PM<BR><B>To:</B>
mpich-discuss@mcs.anl.gov<BR><B>Subject:</B> Re: [mpich-discuss] Problems
running mpi application on differentCPUs<BR></FONT><BR></DIV>
<DIV></DIV>
<DIV>When deploying MPICH2 to a small cluster, I noticed that many had
problems with the "--with-device=ch3:nemesis"</DIV>
<DIV>Try using the "--with-device=ch3:sock" interface instead.</DIV>
<DIV> </DIV>
<DIV>Cye<BR><BR></DIV>
<DIV class=gmail_quote>On Mon, Sep 28, 2009 at 12:13 PM, Rajeev Thakur <SPAN
dir=ltr><<A
href="mailto:thakur@mcs.anl.gov">thakur@mcs.anl.gov</A>></SPAN> wrote:<BR>
<BLOCKQUOTE
style="BORDER-LEFT: #ccc 1px solid; MARGIN: 0px 0px 0px 0.8ex; PADDING-LEFT: 1ex"
class=gmail_quote>Try using the mpdcheck utility to debug as described in
the appendix of<BR>the installation guide. Pick one client and the
server.<BR>
<DIV class=im><BR>Rajeev<BR><BR>> -----Original Message-----<BR>>
From: <A
href="mailto:mpich-discuss-bounces@mcs.anl.gov">mpich-discuss-bounces@mcs.anl.gov</A><BR>>
[mailto:<A
href="mailto:mpich-discuss-bounces@mcs.anl.gov">mpich-discuss-bounces@mcs.anl.gov</A>]
On Behalf Of<BR>> Gaetano Bellanca<BR></DIV>
<DIV>
<DIV></DIV>
<DIV class=h5>> Sent: Monday, September 28, 2009 6:00 AM<BR>> Cc: <A
href="mailto:mpich-discuss@mcs.anl.gov">mpich-discuss@mcs.anl.gov</A><BR>>
Subject: Re: [mpich-discuss] Problems running mpi application<BR>> on
different CPUs<BR>><BR>> Dear Rajeev,<BR>><BR>> thanks for your
help. I disabled the firewall on the server (the only<BR>> one running)
and tried with any other combination.<BR>> All the clients together are
running correctly. The same for the<BR>> processors on the server
separately.<BR>> The problem is only when I mix processes on the server
and on<BR>> the client.<BR>><BR>> When I run mpdtrace on the
server, all the CPUs are<BR>> responding correctly.<BR>> The same
happens if I run in parallel 'hostname'<BR>><BR>> Probably, it is a
problem of my code, but it works on a cluster of 10<BR>> Pentium IV
PEs.<BR>> I discover a 'strange behavior':<BR>> 1) running the code
with the server as a first machine of the<BR>> pool, the<BR>> code
hangs with the previously communicated error;<BR>> 2) if I put the server
as a second machine of the pool, the<BR>> code starts<BR>> and works
regularly up to the writing procedures, opens the<BR>> first file<BR>>
and then remains indefinitely waiting for something;<BR>><BR>> Should
I compile mpich2 with some particular communicator? I have<BR>> nemesis,
at the moment.<BR>> I'm using this for mpich2 compilation:<BR>>
./configure --prefix=/opt/mpich2/1.1/intel11.1 --enable-cxx<BR>>
--enable-f90<BR>> --enable-fast --enable-traceback --with-mpe
--enable-f90modules<BR>> --enable-cache --enable-romio
--with-file-system=nfs+ufs+pvfs2<BR>> --with-device=ch3:nemesis
--with-pvfs2=/usr/local<BR>>
--with-java=/usr/lib/jvm/java-6-sun-1.6.0.07/
--with-pm=mpd:hydra<BR>><BR>> Regards<BR>><BR>>
Gaetano<BR>><BR>> Rajeev Thakur ha scritto:<BR>> > Try running
on smaller subsets of the machines to debug the<BR>> problem. It<BR>>
> is possible that a process on some machine cannot connect to
another<BR>> > because of some firewall settings.<BR>> ><BR>>
> Rajeev<BR>> ><BR>> ><BR>> >> -----Original
Message-----<BR>> >> From: <A
href="mailto:mpich-discuss-bounces@mcs.anl.gov">mpich-discuss-bounces@mcs.anl.gov</A><BR>>
>> [mailto:<A
href="mailto:mpich-discuss-bounces@mcs.anl.gov">mpich-discuss-bounces@mcs.anl.gov</A>]
On Behalf Of<BR>> Gaetano Bellanca<BR>> >> Sent: Saturday,
September 26, 2009 7:10 AM<BR>> >> To: <A
href="mailto:mpich-discuss@mcs.anl.gov">mpich-discuss@mcs.anl.gov</A><BR>>
>> Subject: [mpich-discuss] Problems running mpi application
on<BR>> >> different CPUs<BR>> >><BR>> >>
Hi,<BR>> >><BR>> >> I'm sorry but I posted with a
wrong Object my previous message!!!<BR>> >><BR>> >> I have
a small cluster of<BR>> >> a) 1 server: dual processor / quad core
Intel(R) Xeon(R) CPU E5345<BR>> >> b) 4 clients: single
processor / dual core Intel(R)<BR>> Core(TM)2 Duo CPU<BR>> >>
E8400 connected with a 1Gbit/s ethernet network.<BR>>
>><BR>> >> I compiled mpich2-1.1.1p1 on the first system (a)
and<BR>> share mpich on<BR>> >> the other computers via nfs. I
have mpd running as a root<BR>> on all the<BR>> >> computers
(ubunt 8.04 . kernel 2.6.24-24-server)<BR>> >><BR>> >>
When I run my code in parallel on the first system, it works<BR>>
>> correctly; the same happens running the same code in<BR>>
parallel on the<BR>> >> other computers (always running the code
from the server). On the<BR>> >> contrary, running the code using
processors from both the<BR>> server and<BR>> >> the clients at
the same time with the command:<BR>> >><BR>> >> mpiexec
-machinefile machinefile -n 4 my_parallel_code<BR>> >><BR>>
>> I receive this error message:<BR>> >><BR>> >>
Fatal error in MPI_Init: Other MPI error, error stack:<BR>> >>
MPIR_Init_thread(394): Initialization failed<BR>> >> (unknown)():
Other MPI error<BR>> >> rank 3 in job 8 c1_4545 caused
collective abort of all ranks<BR>> >> exit status of rank 3:
return code 1<BR>> >><BR>> >> Should I use some particular
flags in compilation or at run time?<BR>> >><BR>> >>
Regards.<BR>> >><BR>> >> Gaetano<BR>> >><BR>>
>> --<BR>> >> Gaetano Bellanca - Department of Engineering -
University<BR>> of Ferrara<BR>> >> Via Saragat, 1 - 44100 -
Ferrara - ITALY Voice (VoIP): +39 0532<BR>> >> 974809 Fax: +39 0532
974870 mailto:<A
href="mailto:gaetano.bellanca@unife.it">gaetano.bellanca@unife.it</A><BR>>
>><BR>> >> L'istruzione costa? Stanno provando con
l'ignoranza!<BR>> >><BR>> >><BR>> >><BR>>
><BR>> ><BR>> ><BR>><BR>> --<BR>> Gaetano Bellanca -
Department of Engineering - University of Ferrara<BR>> Via Saragat, 1 -
44100 - Ferrara - ITALY<BR>> Voice (VoIP): +39 0532 974809 Fax: +39 0532
974870<BR>> mailto:<A
href="mailto:gaetano.bellanca@unife.it">gaetano.bellanca@unife.it</A><BR>><BR>>
L'istruzione costa? Stanno provando con
l'ignoranza!<BR>><BR>><BR>><BR><BR></DIV></DIV></BLOCKQUOTE></DIV><BR><BR
clear=all>
<DIV></DIV><BR>-- <BR>"If you already know what recursion is, just remember
the answer. Otherwise, find someone who is standing closer to<BR>Douglas
Hofstadter than you are; then ask him or her what recursion is." - Andrew
Plotkin<BR></BLOCKQUOTE></BODY></HTML>