[mpich-discuss] Problems Running WRF on Ubuntu 11.10, MPICH2
Gustavo Correa
gus at ldeo.columbia.edu
Thu Feb 9 07:59:18 CST 2012
Hi Basu
Sorry, I missed the 'dmpar' information.
I am not familiar to it, but I guess it is the Cray trick to make the CX1 machine
look like a standard distributed memory environment?
[As opposed to a full shared memory environment across all nodes,
which would be 'smpar', right?]
If 'dmpar' it is a standard distributed memory environment, I presume all that
I said before still holds. I would just try to set KMP_STACKSIZE to 16m or more on
all nodes, and run WRF again.
FYI, I had some issues compiling some models with Intel 12.0, and in other mailing
lists I saw people that had issues with version 12.1.
However, I compiled some models with Intel 11.1 correctly, but as I said before, not WRF.
BTW, we're cheap here. No funding for fancy machines, no Cray, no IBM, no SGI.
The top thing we can buy is a standard Linux cluster once in a while. :)
Good luck,
Gus Correa
On Feb 9, 2012, at 8:25 AM, Sukanta Basu wrote:
> Hi Gus,
>
> Thanks for your email.
>
> I am compiling WRF with dmpar option (distributed memory). WRF has a
> different option for hybrid openmp+mpi (they call it dmpar+smpar). To
> the best of my knowledge, openmp is not invoked.
>
> I do understand the distinction between openmp and openmpi. Yesterday,
> I uninstalled mpich2 and installed openmpi. I compiled and ran wrf
> jobs. As I mentioned before, I faced different types of problems.
>
> I have been using WRF on various clusters for ~6-7 years. I bought a
> Cray CX1 recently and trying to set it up myself for running WRF
> locally. Now, I am suspecting that there is some compatibility issues
> between WRF and the Intel Composer. I used to use Intel 11.1 compiler.
>
> I will set KMP_STACKSIZE and re-run the simulations with wrf+mpich2+intel.
>
> Best regards,
> Sukanta
>
> On Thu, Feb 9, 2012 at 8:02 AM, Gustavo Correa <gus at ldeo.columbia.edu> wrote:
>> Hi Sukanta
>>
>> Did you read the final part of my previous email about KMP_STACKSIZE?
>> This is how Intel calls the OpenMP threads stack size.
>> I think you misspelled that environment variable [it is not MP_STACKSIZE as your email says].
>>
>> Did you compile WRF with OpenMP turned on and with the Intel compiler?
>> If you did, you certainly need to increase also the threads' stack size.
>>
>> I had experiences similar to yours, with other models compiled with Intel ifort,
>> and OpenMP, i.e., unexplained segmentation faults, even though the stacksize was
>> set to unlimited.
>>
>> Some time ago I posted this same solution in this mailing list to somebody
>> at LLNL or ANL, I think, who was having this type of problem as well.
>> It is common in hybrid MPI+OpenMP programs.
>>
>> I would set KMP_STACKSIZE to 16m at least *on all nodes*, maybe in your .bashrc, or in the script that launches the job. I don't remember the syntax on top of my head,
>> but the MPICH2 mpiexec [hydra] probably has a way to export the environment variables
>> to all processes. Check 'man mpiexec'.
>> You must ensure that the environment variable is set *on all nodes*.
>>
>> You may need more than 16m, depending on how fine a grid you are using.
>> In another model here I had to use 512m, but this also depends
>> on how much memory/RAM your nodes have available per core.
>> You could try increasing it step by step, say, doubling each time:
>> 16m, 32m, 64m, ...
>>
>> Anyway, this is a guess based on what happened here.
>> There is no guarantee that it will work, although it may be worth trying it.
>> The problem you see may also be a bug in WRF, or an input/forcing file that is missing, etc.
>>
>> I hope this helps,
>> Gus Correa
>>
>> PS - Note: Just to avoid confusion with names.
>> OpenMP and OpenMPI [or Open MPI] are different things.
>> The former is the thread-based standard for parallelization:
>> http://openmp.org/wp/
>> The latter is another open source MPI, like MPICH2:
>> http://www.open-mpi.org/
>>
>>
>> On Feb 8, 2012, at 10:33 PM, Sukanta Basu wrote:
>>
>>> Hi Gus,
>>>
>>> I tried setting the stack option in limits.conf. No change. I logged
>>> on to each nodes and checked that the ulimit is indeed unlimited.
>>>
>>> I just installed openmpi and recompiled WRF. It now runs with any
>>> array sizes. However, I have a different problem. Now, one of the
>>> processes quits suddenly during the run (with a segmentation fault
>>> error). I think both the mpich2 and openmpi problems are somewhat
>>> related.
>>>
>>> Best regards,
>>> Sukanta
>>>
>>> On Wed, Feb 8, 2012 at 6:20 PM, Gustavo Correa <gus at ldeo.columbia.edu> wrote:
>>>> Hi Sukanta
>>>>
>>>> Did you set the stacksize [not only memlock] to unlimited in
>>>> /etc/security/limits.conf on all nodes?
>>>>
>>>> Not sure this will work, but you could try to run 'ulimit -s' and 'ulimit -l' via mpiexec, just to check:
>>>>
>>>> mpiexec -prepend-rank -f hostfile -np 32 ulimit -s
>>>> mpiexec -prepend-rank -f hostfile -np 32 ulimit -l
>>>>
>>>> Or just login to each node and check.
>>>>
>>>> Also, if your WRF is compiled with OpenMP,
>>>> I think the Intel-specific environment variable for OMP_STACKSIZE is
>>>> KMP_STACKSIZE [not MP_STACKSIZE], although they should also accept
>>>> the portable/standard OMP_STACKSIZE [but I don't know if they do].
>>>> For some models here I had to make is as big as 512m [I don't run wrf, though].
>>>> 'man ifort' should tell more about it [at the end of the man page].
>>>>
>>>> I hope this helps,
>>>> Gus Correa
>>>>
>>>> On Feb 8, 2012, at 4:23 PM, Anthony Chan wrote:
>>>>
>>>>>
>>>>> There is fpi, Fortran counterpart of cpi, you can try that.
>>>>> Also, there is MPICH2 testsuite which is located in
>>>>> mpich2-xxx/test/mpi can be invoked by "make testing".
>>>>> It is unlikely those tests will reveal anything.
>>>>> The testsuite is meant to test the MPI implementation
>>>>> not your app.
>>>>>
>>>>> As what you said earlier, your difficulty in running WRF
>>>>> with larger dataset is memory related. You should contact WRF
>>>>> emailing list for more pointers.
>>>>>
>>>>> ----- Original Message -----
>>>>>> Hi Anthony,
>>>>>>
>>>>>> Is there any other mpi example code (other than cpi.c) that I could
>>>>>> test which will give me more information about my mpich setup?
>>>>>>
>>>>>> Here is the output from cpi (using 32 cores on 4 nodes):
>>>>>>
>>>>>> mpiuser at crayN1-5150jo:~/Misc$ mpiexec -f mpd.hosts -n 32 ./cpi
>>>>>> Process 1 on crayN1-5150jo
>>>>>> Process 18 on crayN2-5150jo
>>>>>> Process 2 on crayN2-5150jo
>>>>>> Process 26 on crayN2-5150jo
>>>>>> Process 5 on crayN1-5150jo
>>>>>> Process 14 on crayN2-5150jo
>>>>>> Process 21 on crayN1-5150jo
>>>>>> Process 22 on crayN2-5150jo
>>>>>> Process 25 on crayN1-5150jo
>>>>>> Process 6 on crayN2-5150jo
>>>>>> Process 9 on crayN1-5150jo
>>>>>> Process 17 on crayN1-5150jo
>>>>>> Process 30 on crayN2-5150jo
>>>>>> Process 10 on crayN2-5150jo
>>>>>> Process 29 on crayN1-5150jo
>>>>>> Process 13 on crayN1-5150jo
>>>>>> Process 8 on crayN3-5150jo
>>>>>> Process 20 on crayN3-5150jo
>>>>>> Process 4 on crayN3-5150jo
>>>>>> Process 12 on crayN3-5150jo
>>>>>> Process 0 on crayN3-5150jo
>>>>>> Process 24 on crayN3-5150jo
>>>>>> Process 16 on crayN3-5150jo
>>>>>> Process 28 on crayN3-5150jo
>>>>>> Process 3 on crayN4-5150jo
>>>>>> Process 7 on crayN4-5150jo
>>>>>> Process 11 on crayN4-5150jo
>>>>>> Process 23 on crayN4-5150jo
>>>>>> Process 27 on crayN4-5150jo
>>>>>> Process 31 on crayN4-5150jo
>>>>>> Process 19 on crayN4-5150jo
>>>>>> Process 15 on crayN4-5150jo
>>>>>> pi is approximately 3.1416009869231249, Error is 0.0000083333333318
>>>>>> wall clock time = 0.009401
>>>>>>
>>>>>> Best regards,
>>>>>> Sukanta
>>>>>>
>>>>>> On Wed, Feb 8, 2012 at 1:19 PM, Anthony Chan <chan at mcs.anl.gov> wrote:
>>>>>>>
>>>>>>> Hmm.. Not sure what is happening.. I don't see anything
>>>>>>> obviously wrong in your mpiexec verbose output (though
>>>>>>> I am not hydra expert). Your code now is killed because of
>>>>>>> segmentation fault. Naively, I would recompile WRF with -g
>>>>>>> and use a debugger to see where segfault is. If you don't want
>>>>>>> to mess around WRF source code, you may want to contact WRF
>>>>>>> developers to see if they have encountered similar problem
>>>>>>> before.
>>>>>>>
>>>>>>> ----- Original Message -----
>>>>>>>> Dear Anthony,
>>>>>>>>
>>>>>>>> Thanks for your response. Yes, I did try MP_STACK_SIZE and
>>>>>>>> OMP_STACKSIZE. The error is still there. I have attached a log file
>>>>>>>> (I
>>>>>>>> ran mpiexec with -verbose option). May be this will help.
>>>>>>>>
>>>>>>>> Best regards,
>>>>>>>> Sukanta
>>>>>>>>
>>>>>>>> On Tue, Feb 7, 2012 at 3:28 PM, Anthony Chan <chan at mcs.anl.gov>
>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>> I am not familar with WRF, and not sure if WRF uses any thread
>>>>>>>>> in dmpar mode. Did you try setting MP_STACK_SIZE or OMP_STACKSIZE
>>>>>>>>> ?
>>>>>>>>>
>>>>>>>>> see: http://forum.wrfforum.com/viewtopic.php?f=6&t=255
>>>>>>>>>
>>>>>>>>> A.Chan
>>>>>>>>>
>>>>>>>>> ----- Original Message -----
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> I am using a small cluster of 4 nodes (each with 8 cores + 24 GB
>>>>>>>>>> RAM).
>>>>>>>>>> OS: Ubuntu 11.10. The cluster uses nfs file system and gigE
>>>>>>>>>> connections.
>>>>>>>>>>
>>>>>>>>>> I installed mpich2 and ran cpi.c program successfully.
>>>>>>>>>>
>>>>>>>>>> I installed WRF (http://www.wrf-model.org/index.php) using the
>>>>>>>>>> intel
>>>>>>>>>> compilers (dmpar option)
>>>>>>>>>> I set ulimit -l and -s to be unlimited in .bashrc (all nodes)
>>>>>>>>>> I set memlock to be unlimited in limits.conf (all nodes)
>>>>>>>>>> I have password-less ssh (public key sharing) on all the nodes
>>>>>>>>>> I ran parallel jobs with 40x40x40, 40x40x50, and 40x40x60 grid
>>>>>>>>>> points
>>>>>>>>>> successfully. However, when I utilize 40x40x80 grid points, I
>>>>>>>>>> get
>>>>>>>>>> the
>>>>>>>>>> following MPI error:
>>>>>>>>>>
>>>>>>>>>> **********************************************************
>>>>>>>>>> Fatal error in PMPI_Wait: Other MPI error, error stack:
>>>>>>>>>> PMPI_Wait(183)............: MPI_Wait(request=0x34e83a4,
>>>>>>>>>> status=0x7fff7b24c400) failed
>>>>>>>>>> MPIR_Wait_impl(77)........:
>>>>>>>>>> dequeue_and_set_error(596): Communication error with rank 8
>>>>>>>>>> **********************************************************
>>>>>>>>>> Given that I can run the exact simulation with slightly lesser
>>>>>>>>>> number
>>>>>>>>>> of grid points without any problem, this error is related to
>>>>>>>>>> stack
>>>>>>>>>> size. What could be the problem?
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Sukanta
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Sukanta Basu
>>>>>>>>>> Associate Professor
>>>>>>>>>> North Carolina State University
>>>>>>>>>> http://www4.ncsu.edu/~sbasu5/
>>>>>>>>>> _______________________________________________
>>>>>>>>>> mpich-discuss mailing list mpich-discuss at mcs.anl.gov
>>>>>>>>>> To manage subscription options or unsubscribe:
>>>>>>>>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>>>>>>>> _______________________________________________
>>>>>>>>> mpich-discuss mailing list mpich-discuss at mcs.anl.gov
>>>>>>>>> To manage subscription options or unsubscribe:
>>>>>>>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Sukanta Basu
>>>>>>>> Associate Professor
>>>>>>>> North Carolina State University
>>>>>>>> http://www4.ncsu.edu/~sbasu5/
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Sukanta Basu
>>>>>> Associate Professor
>>>>>> North Carolina State University
>>>>>> http://www4.ncsu.edu/~sbasu5/
>>>>> _______________________________________________
>>>>> mpich-discuss mailing list mpich-discuss at mcs.anl.gov
>>>>> To manage subscription options or unsubscribe:
>>>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>>>
>>>> _______________________________________________
>>>> mpich-discuss mailing list mpich-discuss at mcs.anl.gov
>>>> To manage subscription options or unsubscribe:
>>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>>
>>>
>>>
>>> --
>>> Sukanta Basu
>>> Associate Professor
>>> North Carolina State University
>>> http://www4.ncsu.edu/~sbasu5/
>>> _______________________________________________
>>> mpich-discuss mailing list mpich-discuss at mcs.anl.gov
>>> To manage subscription options or unsubscribe:
>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>
>> _______________________________________________
>> mpich-discuss mailing list mpich-discuss at mcs.anl.gov
>> To manage subscription options or unsubscribe:
>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>
>
>
> --
> Sukanta Basu
> Associate Professor
> North Carolina State University
> http://www4.ncsu.edu/~sbasu5/
> _______________________________________________
> mpich-discuss mailing list mpich-discuss at mcs.anl.gov
> To manage subscription options or unsubscribe:
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
More information about the mpich-discuss
mailing list