[mpich-discuss] Problems Running WRF on Ubuntu 11.10, MPICH2

Thu Feb 9 08:22:33 CST 2012

Dear Gus,

I tried with KMP_STACKSIZE in .bashrc. No luck.

I would like to mention that I am using NFS and mirroring a directory
containing WRF. I have followed:
https://help.ubuntu.com/community/MpichCluster

All the compilers are installed on individual nodes in an identical manner.

Changing .bashrc on the master node is immediately reflected on the
client nodes. I have keyless ssh. The sshd_config files include UseDNS
= no.

In summary,

(a) if I run: mpiexec -n 8 ./wrf.exe -> it runs fine on the local node
[irrespective of the size]
(b) if I run: mpiexec -f mpd.hosts -n 8 ./wrf.exe -> it runs fine for
small jobs; however it produces MPI errors for larger runs.

We can use -n 2 (or any other number). The trend is identical.

Each individual nodes have 24 GB RAM (more than enough for any of the
simulation).

Best regards,
Sukanta

On Thu, Feb 9, 2012 at 8:59 AM, Gustavo Correa <gus at ldeo.columbia.edu> wrote:
> Hi Basu
>
> Sorry, I missed the 'dmpar' information.
> I am not familiar to it, but I guess it is the Cray trick to make the CX1 machine
> look like a standard distributed memory environment?
> [As opposed to a full shared memory environment across all nodes,
> which would be 'smpar', right?]
>
> If 'dmpar' it is a standard distributed memory environment, I presume all that
> I said before still holds.  I would just try to set KMP_STACKSIZE to 16m or more on
> all nodes, and run WRF again.
>
> FYI, I had some issues compiling some models with Intel 12.0, and in other mailing
> lists I saw people that had issues with version 12.1.
> However, I compiled some models with Intel 11.1 correctly, but as I said before, not WRF.
>
> BTW, we're cheap here.  No funding for fancy machines, no Cray, no IBM, no SGI.
> The top thing we can buy is a standard Linux cluster once in a while. :)
>
> Good luck,
> Gus Correa
>
> On Feb 9, 2012, at 8:25 AM, Sukanta Basu wrote:
>
>> Hi Gus,
>>
>> Thanks for your email.
>>
>> I am compiling WRF with dmpar option (distributed memory). WRF has a
>> different option for hybrid openmp+mpi (they call it dmpar+smpar). To
>> the best of my knowledge, openmp is not invoked.
>>
>> I do understand the distinction between openmp and openmpi. Yesterday,
>> I uninstalled mpich2 and installed openmpi. I compiled and ran wrf
>> jobs. As I mentioned before, I faced different types of problems.
>>
>> I have been using WRF on various clusters for ~6-7 years. I bought a
>> Cray CX1 recently and trying to set it up myself for running WRF
>> locally. Now, I am suspecting that there is some compatibility issues
>> between WRF and the Intel Composer. I used to use Intel 11.1 compiler.
>>
>> I will set KMP_STACKSIZE and re-run the simulations with wrf+mpich2+intel.
>>
>> Best regards,
>> Sukanta
>>
>> On Thu, Feb 9, 2012 at 8:02 AM, Gustavo Correa <gus at ldeo.columbia.edu> wrote:
>>> Hi Sukanta
>>>
>>> Did you read the final part of my previous email about KMP_STACKSIZE?
>>> This is how Intel calls the OpenMP threads stack size.
>>> I think you misspelled that environment variable [it is not MP_STACKSIZE as your email says].
>>>
>>> Did you compile WRF with OpenMP turned on and with the Intel compiler?
>>> If you did, you certainly need to increase also the threads' stack size.
>>>
>>> I had experiences similar to yours, with other models compiled with Intel ifort,
>>> and OpenMP, i.e., unexplained segmentation faults, even though the stacksize was
>>> set to unlimited.
>>>
>>> Some time ago I posted this same solution in this mailing list to somebody
>>> at LLNL or ANL, I think, who was having this type of problem as well.
>>> It is common in hybrid MPI+OpenMP programs.
>>>
>>> I would set KMP_STACKSIZE to 16m  at least *on all nodes*, maybe in your .bashrc, or in the script that launches the job.  I don't remember the syntax on top of my head,
>>> but the MPICH2 mpiexec [hydra] probably has a way to export the environment variables
>>> to all processes.  Check 'man mpiexec'.
>>> You must ensure that the environment variable is set *on all nodes*.
>>>
>>> You may need more than 16m, depending on how fine a grid you are using.
>>> In another model here I had to use 512m, but this also depends
>>> on how much memory/RAM your nodes have available per core.
>>> You could try increasing it step by step, say, doubling each  time:
>>> 16m, 32m, 64m, ...
>>>
>>> Anyway, this is a guess based on what happened here.
>>> There is no guarantee that it will work, although it may be worth trying it.
>>> The problem you see may also be a bug in WRF, or an input/forcing file that is missing, etc.
>>>
>>> I hope this helps,
>>> Gus Correa
>>>
>>> PS - Note:  Just to avoid confusion with names.
>>> OpenMP and OpenMPI  [or Open MPI] are different things.
>>> The former is the thread-based standard for parallelization:
>>> http://openmp.org/wp/
>>> The latter is another open source  MPI, like MPICH2:
>>> http://www.open-mpi.org/
>>>
>>>
>>> On Feb 8, 2012, at 10:33 PM, Sukanta Basu wrote:
>>>
>>>> Hi Gus,
>>>>
>>>> I tried setting the stack option in limits.conf. No change. I logged
>>>> on to each nodes and checked that the ulimit is indeed unlimited.
>>>>
>>>> I just installed openmpi and recompiled WRF. It now runs with any
>>>> array sizes. However, I have a different problem. Now, one of the
>>>> processes quits suddenly during the run (with a segmentation fault
>>>> error). I think both the mpich2 and openmpi problems are somewhat
>>>> related.
>>>>
>>>> Best regards,
>>>> Sukanta
>>>>
>>>> On Wed, Feb 8, 2012 at 6:20 PM, Gustavo Correa <gus at ldeo.columbia.edu> wrote:
>>>>> Hi Sukanta
>>>>>
>>>>> Did you set the stacksize [not only memlock] to unlimited in
>>>>> /etc/security/limits.conf on all nodes?
>>>>>
>>>>> Not sure this will work, but you could try to run 'ulimit -s'  and 'ulimit -l' via mpiexec, just to check:
>>>>>
>>>>> mpiexec -prepend-rank -f hostfile -np 32 ulimit -s
>>>>> mpiexec -prepend-rank -f hostfile -np 32 ulimit -l
>>>>>
>>>>> Or just login to each node and check.
>>>>>
>>>>> Also, if your WRF is compiled with OpenMP,
>>>>> I think the Intel-specific environment variable for OMP_STACKSIZE is
>>>>> KMP_STACKSIZE [not MP_STACKSIZE], although they should also accept
>>>>> the portable/standard OMP_STACKSIZE [but I don't know if they do].
>>>>> For some models here I had to make is as big as 512m [I don't run wrf, though].
>>>>> 'man ifort' should tell more about it [at the end of the man page].
>>>>>
>>>>> I hope this helps,
>>>>> Gus Correa
>>>>>
>>>>> On Feb 8, 2012, at 4:23 PM, Anthony Chan wrote:
>>>>>
>>>>>>
>>>>>> There is fpi, Fortran counterpart of cpi, you can try that.
>>>>>> Also, there is MPICH2 testsuite which is located in
>>>>>> mpich2-xxx/test/mpi can be invoked by "make testing".
>>>>>> It is unlikely those tests will reveal anything.
>>>>>> The testsuite is meant to test the MPI implementation
>>>>>> not your app.
>>>>>>
>>>>>> As what you said earlier, your difficulty in running WRF
>>>>>> with larger dataset is memory related.  You should contact WRF
>>>>>> emailing list for more pointers.
>>>>>>
>>>>>> ----- Original Message -----
>>>>>>> Hi Anthony,
>>>>>>>
>>>>>>> Is there any other mpi example code (other than cpi.c) that I could
>>>>>>> test which will give me more information about my mpich setup?
>>>>>>>
>>>>>>> Here is the output from cpi (using 32 cores on 4 nodes):
>>>>>>>
>>>>>>> mpiuser at crayN1-5150jo:~/Misc$ mpiexec -f mpd.hosts -n 32 ./cpi
>>>>>>> Process 1 on crayN1-5150jo
>>>>>>> Process 18 on crayN2-5150jo
>>>>>>> Process 2 on crayN2-5150jo
>>>>>>> Process 26 on crayN2-5150jo
>>>>>>> Process 5 on crayN1-5150jo
>>>>>>> Process 14 on crayN2-5150jo
>>>>>>> Process 21 on crayN1-5150jo
>>>>>>> Process 22 on crayN2-5150jo
>>>>>>> Process 25 on crayN1-5150jo
>>>>>>> Process 6 on crayN2-5150jo
>>>>>>> Process 9 on crayN1-5150jo
>>>>>>> Process 17 on crayN1-5150jo
>>>>>>> Process 30 on crayN2-5150jo
>>>>>>> Process 10 on crayN2-5150jo
>>>>>>> Process 29 on crayN1-5150jo
>>>>>>> Process 13 on crayN1-5150jo
>>>>>>> Process 8 on crayN3-5150jo
>>>>>>> Process 20 on crayN3-5150jo
>>>>>>> Process 4 on crayN3-5150jo
>>>>>>> Process 12 on crayN3-5150jo
>>>>>>> Process 0 on crayN3-5150jo
>>>>>>> Process 24 on crayN3-5150jo
>>>>>>> Process 16 on crayN3-5150jo
>>>>>>> Process 28 on crayN3-5150jo
>>>>>>> Process 3 on crayN4-5150jo
>>>>>>> Process 7 on crayN4-5150jo
>>>>>>> Process 11 on crayN4-5150jo
>>>>>>> Process 23 on crayN4-5150jo
>>>>>>> Process 27 on crayN4-5150jo
>>>>>>> Process 31 on crayN4-5150jo
>>>>>>> Process 19 on crayN4-5150jo
>>>>>>> Process 15 on crayN4-5150jo
>>>>>>> pi is approximately 3.1416009869231249, Error is 0.0000083333333318
>>>>>>> wall clock time = 0.009401
>>>>>>>
>>>>>>> Best regards,
>>>>>>> Sukanta
>>>>>>>
>>>>>>> On Wed, Feb 8, 2012 at 1:19 PM, Anthony Chan <chan at mcs.anl.gov> wrote:
>>>>>>>>
>>>>>>>> Hmm.. Not sure what is happening.. I don't see anything
>>>>>>>> obviously wrong in your mpiexec verbose output (though
>>>>>>>> I am not hydra expert). Your code now is killed because of
>>>>>>>> segmentation fault. Naively, I would recompile WRF with -g
>>>>>>>> and use a debugger to see where segfault is. If you don't want
>>>>>>>> to mess around WRF source code, you may want to contact WRF
>>>>>>>> developers to see if they have encountered similar problem
>>>>>>>> before.
>>>>>>>>
>>>>>>>> ----- Original Message -----
>>>>>>>>> Dear Anthony,
>>>>>>>>>
>>>>>>>>> Thanks for your response. Yes, I did try MP_STACK_SIZE and
>>>>>>>>> OMP_STACKSIZE. The error is still there. I have attached a log file
>>>>>>>>> (I
>>>>>>>>> ran mpiexec with -verbose option). May be this will help.
>>>>>>>>>
>>>>>>>>> Best regards,
>>>>>>>>> Sukanta
>>>>>>>>>
>>>>>>>>> On Tue, Feb 7, 2012 at 3:28 PM, Anthony Chan <chan at mcs.anl.gov>
>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>> I am not familar with WRF, and not sure if WRF uses any thread
>>>>>>>>>> in dmpar mode. Did you try setting MP_STACK_SIZE or OMP_STACKSIZE
>>>>>>>>>> ?
>>>>>>>>>>
>>>>>>>>>> see: http://forum.wrfforum.com/viewtopic.php?f=6&t=255
>>>>>>>>>>
>>>>>>>>>> A.Chan
>>>>>>>>>>
>>>>>>>>>> ----- Original Message -----
>>>>>>>>>>> Hi,
>>>>>>>>>>>
>>>>>>>>>>> I am using a small cluster of 4 nodes (each with 8 cores + 24 GB
>>>>>>>>>>> RAM).
>>>>>>>>>>> OS: Ubuntu 11.10. The cluster uses nfs file system and gigE
>>>>>>>>>>> connections.
>>>>>>>>>>>
>>>>>>>>>>> I installed mpich2 and ran cpi.c program successfully.
>>>>>>>>>>>
>>>>>>>>>>> I installed WRF (http://www.wrf-model.org/index.php) using the
>>>>>>>>>>> intel
>>>>>>>>>>> compilers (dmpar option)
>>>>>>>>>>> I set ulimit -l and -s to be unlimited in .bashrc (all nodes)
>>>>>>>>>>> I set memlock to be unlimited in limits.conf (all nodes)
>>>>>>>>>>> I have password-less ssh (public key sharing) on all the nodes
>>>>>>>>>>> I ran parallel jobs with 40x40x40, 40x40x50, and 40x40x60 grid
>>>>>>>>>>> points
>>>>>>>>>>> successfully. However, when I utilize 40x40x80 grid points, I
>>>>>>>>>>> get
>>>>>>>>>>> the
>>>>>>>>>>> following MPI error:
>>>>>>>>>>>
>>>>>>>>>>> **********************************************************
>>>>>>>>>>> Fatal error in PMPI_Wait: Other MPI error, error stack:
>>>>>>>>>>> PMPI_Wait(183)............: MPI_Wait(request=0x34e83a4,
>>>>>>>>>>> status=0x7fff7b24c400) failed
>>>>>>>>>>> MPIR_Wait_impl(77)........:
>>>>>>>>>>> dequeue_and_set_error(596): Communication error with rank 8
>>>>>>>>>>> **********************************************************
>>>>>>>>>>> Given that I can run the exact simulation with slightly lesser
>>>>>>>>>>> number
>>>>>>>>>>> of grid points without any problem, this error is related to
>>>>>>>>>>> stack
>>>>>>>>>>> size. What could be the problem?
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Sukanta
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> Sukanta Basu
>>>>>>>>>>> Associate Professor
>>>>>>>>>>> North Carolina State University
>>>>>>>>>>> http://www4.ncsu.edu/~sbasu5/
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> mpich-discuss mailing list mpich-discuss at mcs.anl.gov
>>>>>>>>>>> To manage subscription options or unsubscribe:
>>>>>>>>>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>>>>>>>>> _______________________________________________
>>>>>>>>>> mpich-discuss mailing list mpich-discuss at mcs.anl.gov
>>>>>>>>>> To manage subscription options or unsubscribe:
>>>>>>>>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Sukanta Basu
>>>>>>>>> Associate Professor
>>>>>>>>> North Carolina State University
>>>>>>>>> http://www4.ncsu.edu/~sbasu5/
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Sukanta Basu
>>>>>>> Associate Professor
>>>>>>> North Carolina State University
>>>>>>> http://www4.ncsu.edu/~sbasu5/
>>>>>> _______________________________________________
>>>>>> mpich-discuss mailing list     mpich-discuss at mcs.anl.gov
>>>>>> To manage subscription options or unsubscribe:
>>>>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>>>>
>>>>> _______________________________________________
>>>>> mpich-discuss mailing list     mpich-discuss at mcs.anl.gov
>>>>> To manage subscription options or unsubscribe:
>>>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>>>
>>>>
>>>>
>>>> --
>>>> Sukanta Basu
>>>> Associate Professor
>>>> North Carolina State University
>>>> http://www4.ncsu.edu/~sbasu5/
>>>> _______________________________________________
>>>> mpich-discuss mailing list     mpich-discuss at mcs.anl.gov
>>>> To manage subscription options or unsubscribe:
>>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>>
>>> _______________________________________________
>>> mpich-discuss mailing list     mpich-discuss at mcs.anl.gov
>>> To manage subscription options or unsubscribe:
>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>
>>
>>
>> --
>> Sukanta Basu
>> Associate Professor
>> North Carolina State University
>> http://www4.ncsu.edu/~sbasu5/
>> _______________________________________________
>> mpich-discuss mailing list     mpich-discuss at mcs.anl.gov
>> To manage subscription options or unsubscribe:
>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>
> _______________________________________________
> mpich-discuss mailing list     mpich-discuss at mcs.anl.gov
> To manage subscription options or unsubscribe:
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss

-- 
Sukanta Basu
Associate Professor
North Carolina State University
http://www4.ncsu.edu/~sbasu5/