[Nek5000-users] How to run Nek5000 in cluster based on mpich?

nek5000-users at lists.mcs.anl.gov nek5000-users at lists.mcs.anl.gov
Thu Jun 5 00:41:37 CDT 2014


I see, maybe the problem is specific to my machine too, my god. Thank you for your help.

Xianbei







At 2014-06-05 01:35:31, nek5000-users at lists.mcs.anl.gov wrote:

I’m just launching the code as explained in the ‘quick start’ chapter of the user manual. On my computer I had a problem with the sleeping time specified in the nek script. This problem is specific to my machine, but the description of your problem reminded me it. However I don’t know if it is really related ! Good luck !








Best regards,
Emmanuel





On 5 Jun 2014, at 14:58, <nek5000-users at lists.mcs.anl.gov> <nek5000-users at lists.mcs.anl.gov> wrote:


Hi, Emmanuel:
     I try your method, while even if I change sleep to 10 sec, it still has the same error. Can you make it more clear about how you managed to run in different nodes? I mean how to setup and run, what's different with the common case?
Best reguards
Xianbei







At 2014-06-05 12:57:27, nek5000-users at lists.mcs.anl.gov wrote:
Hi all,


I just would like to bring my experience because I had a similar problem. 


In my case it appears that the mpiexec command takes some times to be executed, and it is not fully completed after the end of instruction “sleep 2” . So the command rm -f SESSION.NAME is executed before full execution of mpiexec. Consequently, nek can’t find the session file and it searches for nek.rea, which doesn’t exist.
In my case the problem appears for a cpu number > to 4.


I just added some more seconds to the sleeping time and everything works fine now. "Sleep 5" is reasonable in my case to be able to launch nek on 24 cpus.


Hope that helps !




Best regards,
Emmanuel





On 5 Jun 2014, at 14:11, <nek5000-users at lists.mcs.anl.gov> <nek5000-users at lists.mcs.anl.gov> wrote:


Hi,Paul:
    I change this line:
    mpiexec -np $2 ./nek5000 > $1.log.$2 &
   to

   mpiexec -n $2 ./nek5000 > $1.log.$2 &
   in nekbmpi and then changed the name to 'runnek' in each node
   Then type ./runnek xxx 48
   While the error still exsits.
Xianbei






At 2014-06-04 11:20:52, nek5000-users at lists.mcs.anl.gov wrote:



Hi Xianbei,


Have a look at the nekbmpi script and modify to match the job submission
procedure on your cluster.


Paul


From: nek5000-users-bounces at lists.mcs.anl.gov [nek5000-users-bounces at lists.mcs.anl.gov] on behalf of nek5000-users at lists.mcs.anl.gov [nek5000-users at lists.mcs.anl.gov]
Sent: Wednesday, June 04, 2014 9:34 AM
To: Nek5000
Subject: [Nek5000-users] How to run Nek5000 in cluster based on mpich?


Hi,all:
   I have just built a small cluster of computers, based on mpich.I have tested the capability of this cluster, each node can be called freely.  Now I want to do my simulation in it, however, I don't know how to run Nek. As known, on only 1 computer, one can do like this:
  nekmpi XXX X 
or
  nekbmpi XXX X
  while this is not the case in the cluster! I believe most of you have tried this, I'll appreciate if anyone can give me some advice.
Best regards
Xianbei 




来自网易手机号码邮箱了解更多



来自网易手机号码邮箱了解更多
_______________________________________________
Nek5000-users mailing list
Nek5000-users at lists.mcs.anl.gov
https://lists.mcs.anl.gov/mailman/listinfo/nek5000-users





来自网易手机号码邮箱了解更多
_______________________________________________
Nek5000-users mailing list
Nek5000-users at lists.mcs.anl.gov
https://lists.mcs.anl.gov/mailman/listinfo/nek5000-users

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/nek5000-users/attachments/20140605/7300ba9c/attachment.html>


More information about the Nek5000-users mailing list