[mpich-discuss] Problems Running WRF on Ubuntu 11.10, MPICH2

Gustavo Correa gus at ldeo.columbia.edu
Thu Feb 9 07:02:35 CST 2012


Hi Sukanta

Did you read the final part of my previous email about KMP_STACKSIZE?
This is how Intel calls the OpenMP threads stack size.
I think you misspelled that environment variable [it is not MP_STACKSIZE as your email says].

Did you compile WRF with OpenMP turned on and with the Intel compiler?
If you did, you certainly need to increase also the threads' stack size.

I had experiences similar to yours, with other models compiled with Intel ifort,
and OpenMP, i.e., unexplained segmentation faults, even though the stacksize was 
set to unlimited.

Some time ago I posted this same solution in this mailing list to somebody 
at LLNL or ANL, I think, who was having this type of problem as well.
It is common in hybrid MPI+OpenMP programs.

I would set KMP_STACKSIZE to 16m  at least *on all nodes*, maybe in your .bashrc, or in the script that launches the job.  I don't remember the syntax on top of my head,
but the MPICH2 mpiexec [hydra] probably has a way to export the environment variables 
to all processes.  Check 'man mpiexec'.
You must ensure that the environment variable is set *on all nodes*.

You may need more than 16m, depending on how fine a grid you are using. 
In another model here I had to use 512m, but this also depends
on how much memory/RAM your nodes have available per core.
You could try increasing it step by step, say, doubling each  time: 
16m, 32m, 64m, ...

Anyway, this is a guess based on what happened here.
There is no guarantee that it will work, although it may be worth trying it.
The problem you see may also be a bug in WRF, or an input/forcing file that is missing, etc.

I hope this helps,
Gus Correa

PS - Note:  Just to avoid confusion with names.
OpenMP and OpenMPI  [or Open MPI] are different things.  
The former is the thread-based standard for parallelization:
http://openmp.org/wp/
The latter is another open source  MPI, like MPICH2:
http://www.open-mpi.org/


On Feb 8, 2012, at 10:33 PM, Sukanta Basu wrote:

> Hi Gus,
> 
> I tried setting the stack option in limits.conf. No change. I logged
> on to each nodes and checked that the ulimit is indeed unlimited.
> 
> I just installed openmpi and recompiled WRF. It now runs with any
> array sizes. However, I have a different problem. Now, one of the
> processes quits suddenly during the run (with a segmentation fault
> error). I think both the mpich2 and openmpi problems are somewhat
> related.
> 
> Best regards,
> Sukanta
> 
> On Wed, Feb 8, 2012 at 6:20 PM, Gustavo Correa <gus at ldeo.columbia.edu> wrote:
>> Hi Sukanta
>> 
>> Did you set the stacksize [not only memlock] to unlimited in
>> /etc/security/limits.conf on all nodes?
>> 
>> Not sure this will work, but you could try to run 'ulimit -s'  and 'ulimit -l' via mpiexec, just to check:
>> 
>> mpiexec -prepend-rank -f hostfile -np 32 ulimit -s
>> mpiexec -prepend-rank -f hostfile -np 32 ulimit -l
>> 
>> Or just login to each node and check.
>> 
>> Also, if your WRF is compiled with OpenMP,
>> I think the Intel-specific environment variable for OMP_STACKSIZE is
>> KMP_STACKSIZE [not MP_STACKSIZE], although they should also accept
>> the portable/standard OMP_STACKSIZE [but I don't know if they do].
>> For some models here I had to make is as big as 512m [I don't run wrf, though].
>> 'man ifort' should tell more about it [at the end of the man page].
>> 
>> I hope this helps,
>> Gus Correa
>> 
>> On Feb 8, 2012, at 4:23 PM, Anthony Chan wrote:
>> 
>>> 
>>> There is fpi, Fortran counterpart of cpi, you can try that.
>>> Also, there is MPICH2 testsuite which is located in
>>> mpich2-xxx/test/mpi can be invoked by "make testing".
>>> It is unlikely those tests will reveal anything.
>>> The testsuite is meant to test the MPI implementation
>>> not your app.
>>> 
>>> As what you said earlier, your difficulty in running WRF
>>> with larger dataset is memory related.  You should contact WRF
>>> emailing list for more pointers.
>>> 
>>> ----- Original Message -----
>>>> Hi Anthony,
>>>> 
>>>> Is there any other mpi example code (other than cpi.c) that I could
>>>> test which will give me more information about my mpich setup?
>>>> 
>>>> Here is the output from cpi (using 32 cores on 4 nodes):
>>>> 
>>>> mpiuser at crayN1-5150jo:~/Misc$ mpiexec -f mpd.hosts -n 32 ./cpi
>>>> Process 1 on crayN1-5150jo
>>>> Process 18 on crayN2-5150jo
>>>> Process 2 on crayN2-5150jo
>>>> Process 26 on crayN2-5150jo
>>>> Process 5 on crayN1-5150jo
>>>> Process 14 on crayN2-5150jo
>>>> Process 21 on crayN1-5150jo
>>>> Process 22 on crayN2-5150jo
>>>> Process 25 on crayN1-5150jo
>>>> Process 6 on crayN2-5150jo
>>>> Process 9 on crayN1-5150jo
>>>> Process 17 on crayN1-5150jo
>>>> Process 30 on crayN2-5150jo
>>>> Process 10 on crayN2-5150jo
>>>> Process 29 on crayN1-5150jo
>>>> Process 13 on crayN1-5150jo
>>>> Process 8 on crayN3-5150jo
>>>> Process 20 on crayN3-5150jo
>>>> Process 4 on crayN3-5150jo
>>>> Process 12 on crayN3-5150jo
>>>> Process 0 on crayN3-5150jo
>>>> Process 24 on crayN3-5150jo
>>>> Process 16 on crayN3-5150jo
>>>> Process 28 on crayN3-5150jo
>>>> Process 3 on crayN4-5150jo
>>>> Process 7 on crayN4-5150jo
>>>> Process 11 on crayN4-5150jo
>>>> Process 23 on crayN4-5150jo
>>>> Process 27 on crayN4-5150jo
>>>> Process 31 on crayN4-5150jo
>>>> Process 19 on crayN4-5150jo
>>>> Process 15 on crayN4-5150jo
>>>> pi is approximately 3.1416009869231249, Error is 0.0000083333333318
>>>> wall clock time = 0.009401
>>>> 
>>>> Best regards,
>>>> Sukanta
>>>> 
>>>> On Wed, Feb 8, 2012 at 1:19 PM, Anthony Chan <chan at mcs.anl.gov> wrote:
>>>>> 
>>>>> Hmm.. Not sure what is happening.. I don't see anything
>>>>> obviously wrong in your mpiexec verbose output (though
>>>>> I am not hydra expert). Your code now is killed because of
>>>>> segmentation fault. Naively, I would recompile WRF with -g
>>>>> and use a debugger to see where segfault is. If you don't want
>>>>> to mess around WRF source code, you may want to contact WRF
>>>>> developers to see if they have encountered similar problem
>>>>> before.
>>>>> 
>>>>> ----- Original Message -----
>>>>>> Dear Anthony,
>>>>>> 
>>>>>> Thanks for your response. Yes, I did try MP_STACK_SIZE and
>>>>>> OMP_STACKSIZE. The error is still there. I have attached a log file
>>>>>> (I
>>>>>> ran mpiexec with -verbose option). May be this will help.
>>>>>> 
>>>>>> Best regards,
>>>>>> Sukanta
>>>>>> 
>>>>>> On Tue, Feb 7, 2012 at 3:28 PM, Anthony Chan <chan at mcs.anl.gov>
>>>>>> wrote:
>>>>>>> 
>>>>>>> I am not familar with WRF, and not sure if WRF uses any thread
>>>>>>> in dmpar mode. Did you try setting MP_STACK_SIZE or OMP_STACKSIZE
>>>>>>> ?
>>>>>>> 
>>>>>>> see: http://forum.wrfforum.com/viewtopic.php?f=6&t=255
>>>>>>> 
>>>>>>> A.Chan
>>>>>>> 
>>>>>>> ----- Original Message -----
>>>>>>>> Hi,
>>>>>>>> 
>>>>>>>> I am using a small cluster of 4 nodes (each with 8 cores + 24 GB
>>>>>>>> RAM).
>>>>>>>> OS: Ubuntu 11.10. The cluster uses nfs file system and gigE
>>>>>>>> connections.
>>>>>>>> 
>>>>>>>> I installed mpich2 and ran cpi.c program successfully.
>>>>>>>> 
>>>>>>>> I installed WRF (http://www.wrf-model.org/index.php) using the
>>>>>>>> intel
>>>>>>>> compilers (dmpar option)
>>>>>>>> I set ulimit -l and -s to be unlimited in .bashrc (all nodes)
>>>>>>>> I set memlock to be unlimited in limits.conf (all nodes)
>>>>>>>> I have password-less ssh (public key sharing) on all the nodes
>>>>>>>> I ran parallel jobs with 40x40x40, 40x40x50, and 40x40x60 grid
>>>>>>>> points
>>>>>>>> successfully. However, when I utilize 40x40x80 grid points, I
>>>>>>>> get
>>>>>>>> the
>>>>>>>> following MPI error:
>>>>>>>> 
>>>>>>>> **********************************************************
>>>>>>>> Fatal error in PMPI_Wait: Other MPI error, error stack:
>>>>>>>> PMPI_Wait(183)............: MPI_Wait(request=0x34e83a4,
>>>>>>>> status=0x7fff7b24c400) failed
>>>>>>>> MPIR_Wait_impl(77)........:
>>>>>>>> dequeue_and_set_error(596): Communication error with rank 8
>>>>>>>> **********************************************************
>>>>>>>> Given that I can run the exact simulation with slightly lesser
>>>>>>>> number
>>>>>>>> of grid points without any problem, this error is related to
>>>>>>>> stack
>>>>>>>> size. What could be the problem?
>>>>>>>> 
>>>>>>>> Thanks,
>>>>>>>> Sukanta
>>>>>>>> 
>>>>>>>> --
>>>>>>>> Sukanta Basu
>>>>>>>> Associate Professor
>>>>>>>> North Carolina State University
>>>>>>>> http://www4.ncsu.edu/~sbasu5/
>>>>>>>> _______________________________________________
>>>>>>>> mpich-discuss mailing list mpich-discuss at mcs.anl.gov
>>>>>>>> To manage subscription options or unsubscribe:
>>>>>>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>>>>>> _______________________________________________
>>>>>>> mpich-discuss mailing list mpich-discuss at mcs.anl.gov
>>>>>>> To manage subscription options or unsubscribe:
>>>>>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> --
>>>>>> Sukanta Basu
>>>>>> Associate Professor
>>>>>> North Carolina State University
>>>>>> http://www4.ncsu.edu/~sbasu5/
>>>> 
>>>> 
>>>> 
>>>> --
>>>> Sukanta Basu
>>>> Associate Professor
>>>> North Carolina State University
>>>> http://www4.ncsu.edu/~sbasu5/
>>> _______________________________________________
>>> mpich-discuss mailing list     mpich-discuss at mcs.anl.gov
>>> To manage subscription options or unsubscribe:
>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>> 
>> _______________________________________________
>> mpich-discuss mailing list     mpich-discuss at mcs.anl.gov
>> To manage subscription options or unsubscribe:
>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
> 
> 
> 
> -- 
> Sukanta Basu
> Associate Professor
> North Carolina State University
> http://www4.ncsu.edu/~sbasu5/
> _______________________________________________
> mpich-discuss mailing list     mpich-discuss at mcs.anl.gov
> To manage subscription options or unsubscribe:
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss



More information about the mpich-discuss mailing list