[mpich-discuss] Problems Running WRF on Ubuntu 11.10, MPICH2

Sukanta Basu sukanta.basu at gmail.com
Tue Feb 7 15:25:24 CST 2012


Dear Anthony,

Thanks for your response. Yes, I did try MP_STACK_SIZE and
OMP_STACKSIZE. The error is still there. I have attached a log file (I
ran mpiexec with -verbose option). May be this will help.

Best regards,
Sukanta

On Tue, Feb 7, 2012 at 3:28 PM, Anthony Chan <chan at mcs.anl.gov> wrote:
>
> I am not familar with WRF, and not sure if WRF uses any thread
> in dmpar mode.  Did you try setting MP_STACK_SIZE or OMP_STACKSIZE ?
>
> see: http://forum.wrfforum.com/viewtopic.php?f=6&t=255
>
> A.Chan
>
> ----- Original Message -----
>> Hi,
>>
>> I am using a small cluster of 4 nodes (each with 8 cores + 24 GB RAM).
>> OS: Ubuntu 11.10. The cluster uses nfs file system and gigE
>> connections.
>>
>> I installed mpich2 and ran cpi.c program successfully.
>>
>> I installed WRF (http://www.wrf-model.org/index.php) using the intel
>> compilers (dmpar option)
>> I set ulimit -l and -s to be unlimited in .bashrc (all nodes)
>> I set memlock to be unlimited in limits.conf (all nodes)
>> I have password-less ssh (public key sharing) on all the nodes
>> I ran parallel jobs with 40x40x40, 40x40x50, and 40x40x60 grid points
>> successfully. However, when I utilize 40x40x80 grid points, I get the
>> following MPI error:
>>
>> **********************************************************
>> Fatal error in PMPI_Wait: Other MPI error, error stack:
>> PMPI_Wait(183)............: MPI_Wait(request=0x34e83a4,
>> status=0x7fff7b24c400) failed
>> MPIR_Wait_impl(77)........:
>> dequeue_and_set_error(596): Communication error with rank 8
>> **********************************************************
>> Given that I can run the exact simulation with slightly lesser number
>> of grid points without any problem, this error is related to stack
>> size. What could be the problem?
>>
>> Thanks,
>> Sukanta
>>
>> --
>> Sukanta Basu
>> Associate Professor
>> North Carolina State University
>> http://www4.ncsu.edu/~sbasu5/
>> _______________________________________________
>> mpich-discuss mailing list mpich-discuss at mcs.anl.gov
>> To manage subscription options or unsubscribe:
>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
> _______________________________________________
> mpich-discuss mailing list     mpich-discuss at mcs.anl.gov
> To manage subscription options or unsubscribe:
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss



-- 
Sukanta Basu
Associate Professor
North Carolina State University
http://www4.ncsu.edu/~sbasu5/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: wrf.log
Type: text/x-log
Size: 39105 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20120207/8f3e603c/attachment-0001.bin>


More information about the mpich-discuss mailing list