[mpich-discuss] Problems Running WRF on Ubuntu 11.10, MPICH2

Sukanta Basu sukanta.basu at gmail.com
Tue Feb 7 10:43:20 CST 2012


Hi,

I am using a small cluster of 4 nodes (each with 8 cores + 24 GB RAM).
OS: Ubuntu 11.10. The cluster uses nfs file system and gigE
connections.

I installed mpich2 and ran cpi.c program successfully.

I installed WRF (http://www.wrf-model.org/index.php) using the intel
compilers (dmpar option)
I set ulimit -l and -s to be unlimited in .bashrc (all nodes)
I set memlock to be unlimited in limits.conf (all nodes)
I have password-less ssh (public key sharing) on all the nodes
I ran parallel jobs with 40x40x40, 40x40x50, and 40x40x60 grid points
successfully. However, when I utilize 40x40x80 grid points, I get the
following MPI error:

**********************************************************
Fatal error in PMPI_Wait: Other MPI error, error stack:
PMPI_Wait(183)............: MPI_Wait(request=0x34e83a4,
status=0x7fff7b24c400) failed
MPIR_Wait_impl(77)........:
dequeue_and_set_error(596): Communication error with rank 8
**********************************************************
Given that I can run the exact simulation with slightly lesser number
of grid points without any problem, this error is related to stack
size. What could be the problem?

Thanks,
Sukanta

-- 
Sukanta Basu
Associate Professor
North Carolina State University
http://www4.ncsu.edu/~sbasu5/


More information about the mpich-discuss mailing list