[mpich-discuss] Problems Running WRF on Ubuntu 11.10, MPICH2

Anthony Chan chan at mcs.anl.gov
Tue Feb 7 14:28:59 CST 2012


I am not familar with WRF, and not sure if WRF uses any thread
in dmpar mode.  Did you try setting MP_STACK_SIZE or OMP_STACKSIZE ?

see: http://forum.wrfforum.com/viewtopic.php?f=6&t=255

A.Chan

----- Original Message -----
> Hi,
> 
> I am using a small cluster of 4 nodes (each with 8 cores + 24 GB RAM).
> OS: Ubuntu 11.10. The cluster uses nfs file system and gigE
> connections.
> 
> I installed mpich2 and ran cpi.c program successfully.
> 
> I installed WRF (http://www.wrf-model.org/index.php) using the intel
> compilers (dmpar option)
> I set ulimit -l and -s to be unlimited in .bashrc (all nodes)
> I set memlock to be unlimited in limits.conf (all nodes)
> I have password-less ssh (public key sharing) on all the nodes
> I ran parallel jobs with 40x40x40, 40x40x50, and 40x40x60 grid points
> successfully. However, when I utilize 40x40x80 grid points, I get the
> following MPI error:
> 
> **********************************************************
> Fatal error in PMPI_Wait: Other MPI error, error stack:
> PMPI_Wait(183)............: MPI_Wait(request=0x34e83a4,
> status=0x7fff7b24c400) failed
> MPIR_Wait_impl(77)........:
> dequeue_and_set_error(596): Communication error with rank 8
> **********************************************************
> Given that I can run the exact simulation with slightly lesser number
> of grid points without any problem, this error is related to stack
> size. What could be the problem?
> 
> Thanks,
> Sukanta
> 
> --
> Sukanta Basu
> Associate Professor
> North Carolina State University
> http://www4.ncsu.edu/~sbasu5/
> _______________________________________________
> mpich-discuss mailing list mpich-discuss at mcs.anl.gov
> To manage subscription options or unsubscribe:
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss


More information about the mpich-discuss mailing list