Thank you Rajeev and Gus!<br><br>I have no problems about network connection because<br>I tried to run the "WRF Gnu gcc/gfortran" buildings version<br>and everything is ok (I can run also 5 pc <span class="number"></span><span class="definition">together).<br>
<br>So, </span>on equal other terms, I have problems (on more then 1 pc) only when I use the WRF Intel compiled version (wrf stop with mpi errors).<br>Besides, before start wrf.exe, I need to start another executable (real_nmm.exe) <span class="number"></span><span class="definition">which run correctly both wrf compiled version.</span><br>
<br>About ulimit, I just set "ulimit -s unlimited" for each machine:<br>infact I put this condition on them .bash_profile and also<br>I verified it with ulimit -a for each pc.<br><br>So, <b>it seems only an "WRF Intel version <-> Mpi" problem..</b><br>
For this reason I'm going very crazy! :-(<br><br>Thank you!<br>Fabio.<br><br><br><div class="gmail_quote">2010/9/2 Gus Correa <span dir="ltr"><<a href="mailto:gus@ldeo.columbia.edu">gus@ldeo.columbia.edu</a>></span><br>
<blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">Hi Fabio<br>
<br>
Besides Rajeev's suggestion.<br>
<br>
You mentioned some "error stack" and that to<br>
run on one node successfully you did "ulimit -s unlimited".<br>
This may be needed on *all nodes* running WRF.<br>
It doesn't propagate to the other nodes, if you do it in the command line, or if you put the ulimit command in your job script, for instance.<br>
It can be done in the resource manager (Torque, SGE, SLURM) startup<br>
script, or in the Linux limit configuration files.<br>
Maybe your system administrator can help you with this.<br>
<br>
FYI, a number of large atmosphere/ocean/climate models we<br>
run produce a large program stack, often times larger than the<br>
default limit set by Linux, and do require the change above on<br>
all nodes.<br>
(I haven't run WRF, though.)<br>
<br>
Also, to sort out if your network has a problem, you may want to try<br>
something simpler than WRF.<br>
The cpi.c program in the MPICH2 'examples' directory is good for this.<br>
Compile with mpicc, run with mpirun on all nodes.<br>
<br>
I hope it helps.<br><font color="#888888">
Gus Correa<br>
---------------------------------------------------------------------<br>
Gustavo Correa<br>
Lamont-Doherty Earth Observatory - Columbia University<br>
Palisades, NY, 10964-8000 - USA<br>
---------------------------------------------------------------------</font><div><div></div><div class="h5"><br>
<br>
<br>
Rajeev Thakur wrote:<br>
<blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
Try running the cpi example from the MPICH2 examples directory across two machines. There could be a connection issue between the two machines.<br>
<br>
Rajeev<br>
<br>
On Sep 2, 2010, at 8:20 AM, Fabio F.Gervasi wrote:<br>
<br>
<blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
Hi,<br>
<br>
I have a "strange" MPI problem when I run a WRF-NMM model, compiled with Intel v11.1.072 (by GNU run ok!).<br>
Mpich2-1.2 also is compiled by Intel.<br>
<br>
If I run on a single Quad-core machine everything is ok, but when I try on two or more Quad-core machine,<br>
initially the wrf.exe processes seem start on every pc, but after few second wrf stop and I get the error:<br>
Fatal error in MPI_Allreduce other mpi error error stack.. and so on...<br>
<br>
I just set: "ulimit -s unlimited", otherwise wrf crash also with a single machine...<br>
<br>
This probably is an MPI problem, but how can I fix it?<br>
<br>
Thank you very much<br>
Fabio.<br>
_______________________________________________<br>
mpich-discuss mailing list<br>
<a href="mailto:mpich-discuss@mcs.anl.gov" target="_blank">mpich-discuss@mcs.anl.gov</a><br>
<a href="https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss" target="_blank">https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss</a><br>
</blockquote>
<br>
_______________________________________________<br>
mpich-discuss mailing list<br>
<a href="mailto:mpich-discuss@mcs.anl.gov" target="_blank">mpich-discuss@mcs.anl.gov</a><br>
<a href="https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss" target="_blank">https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss</a><br>
</blockquote>
<br>
_______________________________________________<br>
mpich-discuss mailing list<br>
<a href="mailto:mpich-discuss@mcs.anl.gov" target="_blank">mpich-discuss@mcs.anl.gov</a><br>
<a href="https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss" target="_blank">https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss</a><br>
</div></div></blockquote></div><br>