[mpich-discuss] Problems Running WRF on Ubuntu 11.10, MPICH2

Thu Feb 9 07:25:34 CST 2012

Hi Gus,

Thanks for your email.

I am compiling WRF with dmpar option (distributed memory). WRF has a
different option for hybrid openmp+mpi (they call it dmpar+smpar). To
the best of my knowledge, openmp is not invoked.

I do understand the distinction between openmp and openmpi. Yesterday,
I uninstalled mpich2 and installed openmpi. I compiled and ran wrf
jobs. As I mentioned before, I faced different types of problems.

I have been using WRF on various clusters for ~6-7 years. I bought a
Cray CX1 recently and trying to set it up myself for running WRF
locally. Now, I am suspecting that there is some compatibility issues
between WRF and the Intel Composer. I used to use Intel 11.1 compiler.

I will set KMP_STACKSIZE and re-run the simulations with wrf+mpich2+intel.

Best regards,
Sukanta

On Thu, Feb 9, 2012 at 8:02 AM, Gustavo Correa <gus at ldeo.columbia.edu> wrote:
> Hi Sukanta
>
> Did you read the final part of my previous email about KMP_STACKSIZE?
> This is how Intel calls the OpenMP threads stack size.
> I think you misspelled that environment variable [it is not MP_STACKSIZE as your email says].
>
> Did you compile WRF with OpenMP turned on and with the Intel compiler?
> If you did, you certainly need to increase also the threads' stack size.
>
> I had experiences similar to yours, with other models compiled with Intel ifort,
> and OpenMP, i.e., unexplained segmentation faults, even though the stacksize was
> set to unlimited.
>
> Some time ago I posted this same solution in this mailing list to somebody
> at LLNL or ANL, I think, who was having this type of problem as well.
> It is common in hybrid MPI+OpenMP programs.
>
> I would set KMP_STACKSIZE to 16m  at least *on all nodes*, maybe in your .bashrc, or in the script that launches the job.  I don't remember the syntax on top of my head,
> but the MPICH2 mpiexec [hydra] probably has a way to export the environment variables
> to all processes.  Check 'man mpiexec'.
> You must ensure that the environment variable is set *on all nodes*.
>
> You may need more than 16m, depending on how fine a grid you are using.
> In another model here I had to use 512m, but this also depends
> on how much memory/RAM your nodes have available per core.
> You could try increasing it step by step, say, doubling each  time:
> 16m, 32m, 64m, ...
>
> Anyway, this is a guess based on what happened here.
> There is no guarantee that it will work, although it may be worth trying it.
> The problem you see may also be a bug in WRF, or an input/forcing file that is missing, etc.
>
> I hope this helps,
> Gus Correa
>
> PS - Note:  Just to avoid confusion with names.
> OpenMP and OpenMPI  [or Open MPI] are different things.
> The former is the thread-based standard for parallelization:
> http://openmp.org/wp/
> The latter is another open source  MPI, like MPICH2:
> http://www.open-mpi.org/
>
>
> On Feb 8, 2012, at 10:33 PM, Sukanta Basu wrote:
>
>> Hi Gus,
>>
>> I tried setting the stack option in limits.conf. No change. I logged
>> on to each nodes and checked that the ulimit is indeed unlimited.
>>
>> I just installed openmpi and recompiled WRF. It now runs with any
>> array sizes. However, I have a different problem. Now, one of the
>> processes quits suddenly during the run (with a segmentation fault
>> error). I think both the mpich2 and openmpi problems are somewhat
>> related.
>>
>> Best regards,
>> Sukanta
>>
>> On Wed, Feb 8, 2012 at 6:20 PM, Gustavo Correa <gus at ldeo.columbia.edu> wrote:
>>> Hi Sukanta
>>>
>>> Did you set the stacksize [not only memlock] to unlimited in
>>> /etc/security/limits.conf on all nodes?
>>>
>>> Not sure this will work, but you could try to run 'ulimit -s'  and 'ulimit -l' via mpiexec, just to check:
>>>
>>> mpiexec -prepend-rank -f hostfile -np 32 ulimit -s
>>> mpiexec -prepend-rank -f hostfile -np 32 ulimit -l
>>>
>>> Or just login to each node and check.
>>>
>>> Also, if your WRF is compiled with OpenMP,
>>> I think the Intel-specific environment variable for OMP_STACKSIZE is
>>> KMP_STACKSIZE [not MP_STACKSIZE], although they should also accept
>>> the portable/standard OMP_STACKSIZE [but I don't know if they do].
>>> For some models here I had to make is as big as 512m [I don't run wrf, though].
>>> 'man ifort' should tell more about it [at the end of the man page].
>>>
>>> I hope this helps,
>>> Gus Correa
>>>
>>> On Feb 8, 2012, at 4:23 PM, Anthony Chan wrote:
>>>
>>>>
>>>> There is fpi, Fortran counterpart of cpi, you can try that.
>>>> Also, there is MPICH2 testsuite which is located in
>>>> mpich2-xxx/test/mpi can be invoked by "make testing".
>>>> It is unlikely those tests will reveal anything.
>>>> The testsuite is meant to test the MPI implementation
>>>> not your app.
>>>>
>>>> As what you said earlier, your difficulty in running WRF
>>>> with larger dataset is memory related.  You should contact WRF
>>>> emailing list for more pointers.
>>>>
>>>> ----- Original Message -----
>>>>> Hi Anthony,
>>>>>
>>>>> Is there any other mpi example code (other than cpi.c) that I could
>>>>> test which will give me more information about my mpich setup?
>>>>>
>>>>> Here is the output from cpi (using 32 cores on 4 nodes):
>>>>>
>>>>> mpiuser at crayN1-5150jo:~/Misc$ mpiexec -f mpd.hosts -n 32 ./cpi
>>>>> Process 1 on crayN1-5150jo
>>>>> Process 18 on crayN2-5150jo
>>>>> Process 2 on crayN2-5150jo
>>>>> Process 26 on crayN2-5150jo
>>>>> Process 5 on crayN1-5150jo
>>>>> Process 14 on crayN2-5150jo
>>>>> Process 21 on crayN1-5150jo
>>>>> Process 22 on crayN2-5150jo
>>>>> Process 25 on crayN1-5150jo
>>>>> Process 6 on crayN2-5150jo
>>>>> Process 9 on crayN1-5150jo
>>>>> Process 17 on crayN1-5150jo
>>>>> Process 30 on crayN2-5150jo
>>>>> Process 10 on crayN2-5150jo
>>>>> Process 29 on crayN1-5150jo
>>>>> Process 13 on crayN1-5150jo
>>>>> Process 8 on crayN3-5150jo
>>>>> Process 20 on crayN3-5150jo
>>>>> Process 4 on crayN3-5150jo
>>>>> Process 12 on crayN3-5150jo
>>>>> Process 0 on crayN3-5150jo
>>>>> Process 24 on crayN3-5150jo
>>>>> Process 16 on crayN3-5150jo
>>>>> Process 28 on crayN3-5150jo
>>>>> Process 3 on crayN4-5150jo
>>>>> Process 7 on crayN4-5150jo
>>>>> Process 11 on crayN4-5150jo
>>>>> Process 23 on crayN4-5150jo
>>>>> Process 27 on crayN4-5150jo
>>>>> Process 31 on crayN4-5150jo
>>>>> Process 19 on crayN4-5150jo
>>>>> Process 15 on crayN4-5150jo
>>>>> pi is approximately 3.1416009869231249, Error is 0.0000083333333318
>>>>> wall clock time = 0.009401
>>>>>
>>>>> Best regards,
>>>>> Sukanta
>>>>>
>>>>> On Wed, Feb 8, 2012 at 1:19 PM, Anthony Chan <chan at mcs.anl.gov> wrote:
>>>>>>
>>>>>> Hmm.. Not sure what is happening.. I don't see anything
>>>>>> obviously wrong in your mpiexec verbose output (though
>>>>>> I am not hydra expert). Your code now is killed because of
>>>>>> segmentation fault. Naively, I would recompile WRF with -g
>>>>>> and use a debugger to see where segfault is. If you don't want
>>>>>> to mess around WRF source code, you may want to contact WRF
>>>>>> developers to see if they have encountered similar problem
>>>>>> before.
>>>>>>
>>>>>> ----- Original Message -----
>>>>>>> Dear Anthony,
>>>>>>>
>>>>>>> Thanks for your response. Yes, I did try MP_STACK_SIZE and
>>>>>>> OMP_STACKSIZE. The error is still there. I have attached a log file
>>>>>>> (I
>>>>>>> ran mpiexec with -verbose option). May be this will help.
>>>>>>>
>>>>>>> Best regards,
>>>>>>> Sukanta
>>>>>>>
>>>>>>> On Tue, Feb 7, 2012 at 3:28 PM, Anthony Chan <chan at mcs.anl.gov>
>>>>>>> wrote:
>>>>>>>>
>>>>>>>> I am not familar with WRF, and not sure if WRF uses any thread
>>>>>>>> in dmpar mode. Did you try setting MP_STACK_SIZE or OMP_STACKSIZE
>>>>>>>> ?
>>>>>>>>
>>>>>>>> see: http://forum.wrfforum.com/viewtopic.php?f=6&t=255
>>>>>>>>
>>>>>>>> A.Chan
>>>>>>>>
>>>>>>>> ----- Original Message -----
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> I am using a small cluster of 4 nodes (each with 8 cores + 24 GB
>>>>>>>>> RAM).
>>>>>>>>> OS: Ubuntu 11.10. The cluster uses nfs file system and gigE
>>>>>>>>> connections.
>>>>>>>>>
>>>>>>>>> I installed mpich2 and ran cpi.c program successfully.
>>>>>>>>>
>>>>>>>>> I installed WRF (http://www.wrf-model.org/index.php) using the
>>>>>>>>> intel
>>>>>>>>> compilers (dmpar option)
>>>>>>>>> I set ulimit -l and -s to be unlimited in .bashrc (all nodes)
>>>>>>>>> I set memlock to be unlimited in limits.conf (all nodes)
>>>>>>>>> I have password-less ssh (public key sharing) on all the nodes
>>>>>>>>> I ran parallel jobs with 40x40x40, 40x40x50, and 40x40x60 grid
>>>>>>>>> points
>>>>>>>>> successfully. However, when I utilize 40x40x80 grid points, I
>>>>>>>>> get
>>>>>>>>> the
>>>>>>>>> following MPI error:
>>>>>>>>>
>>>>>>>>> **********************************************************
>>>>>>>>> Fatal error in PMPI_Wait: Other MPI error, error stack:
>>>>>>>>> PMPI_Wait(183)............: MPI_Wait(request=0x34e83a4,
>>>>>>>>> status=0x7fff7b24c400) failed
>>>>>>>>> MPIR_Wait_impl(77)........:
>>>>>>>>> dequeue_and_set_error(596): Communication error with rank 8
>>>>>>>>> **********************************************************
>>>>>>>>> Given that I can run the exact simulation with slightly lesser
>>>>>>>>> number
>>>>>>>>> of grid points without any problem, this error is related to
>>>>>>>>> stack
>>>>>>>>> size. What could be the problem?
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Sukanta
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Sukanta Basu
>>>>>>>>> Associate Professor
>>>>>>>>> North Carolina State University
>>>>>>>>> http://www4.ncsu.edu/~sbasu5/
>>>>>>>>> _______________________________________________
>>>>>>>>> mpich-discuss mailing list mpich-discuss at mcs.anl.gov
>>>>>>>>> To manage subscription options or unsubscribe:
>>>>>>>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>>>>>>> _______________________________________________
>>>>>>>> mpich-discuss mailing list mpich-discuss at mcs.anl.gov
>>>>>>>> To manage subscription options or unsubscribe:
>>>>>>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Sukanta Basu
>>>>>>> Associate Professor
>>>>>>> North Carolina State University
>>>>>>> http://www4.ncsu.edu/~sbasu5/
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Sukanta Basu
>>>>> Associate Professor
>>>>> North Carolina State University
>>>>> http://www4.ncsu.edu/~sbasu5/
>>>> _______________________________________________
>>>> mpich-discuss mailing list     mpich-discuss at mcs.anl.gov
>>>> To manage subscription options or unsubscribe:
>>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>>
>>> _______________________________________________
>>> mpich-discuss mailing list     mpich-discuss at mcs.anl.gov
>>> To manage subscription options or unsubscribe:
>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>
>>
>>
>> --
>> Sukanta Basu
>> Associate Professor
>> North Carolina State University
>> http://www4.ncsu.edu/~sbasu5/
>> _______________________________________________
>> mpich-discuss mailing list     mpich-discuss at mcs.anl.gov
>> To manage subscription options or unsubscribe:
>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>
> _______________________________________________
> mpich-discuss mailing list     mpich-discuss at mcs.anl.gov
> To manage subscription options or unsubscribe:
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss

-- 
Sukanta Basu
Associate Professor
North Carolina State University
http://www4.ncsu.edu/~sbasu5/