[Nek5000-users] MPI-IO in Nek

nek5000-users at lists.mcs.anl.gov nek5000-users at lists.mcs.anl.gov
Sun Oct 20 12:35:59 CDT 2013


Hi Hesam,

Thanks.  Yes - this paints a clear picture.  You are afforded
a total of 512 MB per rank because of two ranks/core, 1 GB/core.
The nek static declaration is 444 MB in Case 1, and less than that
in Case 2.

There is also some non-static declaration in the nek C routines,
particularly for the coarse-grid solver (unless you're using the AMG
coarse-grid solver, which has a smaller memory footprint _in_the
large-problem-size_limt_), and there is the MPI/OS overhead.

>From what you've described, you ran into this latter overhead when
you hit the MPI/IO section.

On another note, your 8192-rank job has about 27000 points per
rank.  You can easily run on more ranks --- typically down to
the 5000-10000 points/rank limit---and still scale very well,
assuming you have this many cores available.  (Not certain of
the size of your machine.)   I can tell you that at ANL they
prefer us to run large short jobs to small long ones, so as
long as efficiency isn't falling off we try to go to the large
P limit.

Paul


----- Original Message -----
From: nek5000-users at lists.mcs.anl.gov
To: nek5000-users at lists.mcs.anl.gov
Sent: Sunday, October 20, 2013 12:12:07 PM
Subject: Re: [Nek5000-users] MPI-IO in Nek

Hi Paul

I think I got my answer in your reply but just to provide some specific
information about this problem, in case it would be useful for anyone:

In both cases I used 8192 mpi tasks (256 nodes, each with 16 cores (1GB ram,
ie Ram/Node 16 GB), and using 32 rank per node) for a problem size of nel=
368,300 elements with lx1=10. First attempt which failed only at the first
io-time, had this information at the tail end of the compiler.out (with
lelt=70):

#############################################################
#                  Compilation successful!                  #
#############################################################
   text    data     bss     dec     hex filename
13172569        1933750 428909448       444015767       1a772497
nek5000

After reducing lelt to lelt=50, with the following information, the problem
was resolved with no memory issues and fine IO (not the difference from ~444
MB to ~333 MB ). 

#############################################################
#                  Compilation successful!                  #
#############################################################
   text    data     bss     dec     hex filename
13172137        1933838 318486536       333592511       13e237bf
nek5000


I suspect the former unsuccessful case, was on the marginal side, whereas
the second case was more relaxed.

Regards,
Hesam


-----Original Message-----
From: nek5000-users-bounces at lists.mcs.anl.gov
[mailto:nek5000-users-bounces at lists.mcs.anl.gov] On Behalf Of
nek5000-users at lists.mcs.anl.gov
Sent: Sunday, October 20, 2013 12:36 PM
To: nek5000-users at lists.mcs.anl.gov
Subject: Re: [Nek5000-users] MPI-IO in Nek


Hi Hesam,

without knowing the details of your case I can't give you a firm answer.

My suggestion motivated simply to see if indeed it was a memory footprint
problem.

Note that we have run tens of thousands of cases on hundreds of platforms
and up to a over a million processes; there is in general no difficulty
getting to a particular point in the computational parameter space.  You do,
however, have to have enough memory per node for the problem at hand.

Paul


----- Original Message -----
From: nek5000-users at lists.mcs.anl.gov
To: nek5000-users at lists.mcs.anl.gov
Sent: Sunday, October 20, 2013 10:57:47 AM
Subject: Re: [Nek5000-users] MPI-IO in Nek

Now my question is, why should we set np*lelt ~ nel, *provided that* the
code runs fine with lelt.gt.(nel/np) and is only killed at the IO-time.
Because if the MPIIO does its job, then every single task is outputting its
own proportion which had been dealing with already in its memory. I am not
sure if I am making any sense here but it would be helpful if you could
comment on that since for really large jobs, one would not prefer to
increase the #tasks (or reduce lelt) only due to the IO problem!

Or maybe there are better ways of reducing memory foot print that I am not
aware of?

Regards,
Hesam


-----Original Message-----
From: nek5000-users-bounces at lists.mcs.anl.gov
[mailto:nek5000-users-bounces at lists.mcs.anl.gov] On Behalf Of
nek5000-users at lists.mcs.anl.gov
Sent: Sunday, October 20, 2013 11:37 AM
To: nek5000-users at lists.mcs.anl.gov
Subject: Re: [Nek5000-users] MPI-IO in Nek

Hi Paul

Further reducing lelt to match np*lelt ~ nel did solve my problem which I
had imagined has to do with parallel IO. Indeed, setting only p66=p67=6 in
usrdat2(), and compiling using MPIIO flag, does provide MPI-IO.

Thank again,
Hesam


-----Original Message-----
From: nek5000-users-bounces at lists.mcs.anl.gov
[mailto:nek5000-users-bounces at lists.mcs.anl.gov] On Behalf Of
nek5000-users at lists.mcs.anl.gov
Sent: Saturday, October 19, 2013 6:02 PM
To: nek5000-users at lists.mcs.anl.gov
Subject: Re: [Nek5000-users] MPI-IO in Nek


Hi Hesam,

Regarding the 8192 tasks case, did you reduce your memory foot print such
that 8192*lelt ~ nel, the number of elements in your case?

I'm pretty certain that you just set p65=1, p66=p67=6 in usrdat2() and all
should work if you've compiled from scratch with the MPI-IO flag.

In principle, the code will figure out from the header that your input has
multiple files, regardless of the value of p65, as should be the case since
the files dictate the input format, not some other flag.

In practice, I know that the preceding statement is true for the non MPI-IO
case, but I'm not 100% certain for the MPI-IO case, which is focussed more
on the single-file approach.  I think however that
it works in this case as well.   If not, let me know --- there are
possible work-arounds.

Paul


----- Original Message -----
From: nek5000-users at lists.mcs.anl.gov
To: nek5000-users at lists.mcs.anl.gov
Sent: Saturday, October 19, 2013 12:49:37 PM
Subject: Re: [Nek5000-users] MPI-IO in Nek





Also, I asked this specifically because my largest job using 8192 tasks was
killed at the first IO-time due to being out of memory although I was using
the .f000 format (either when every processor was outputting or 64 of them)
. Any ideas as to why that might have happened? 



Hesam 





From: nek5000-users-bounces at lists.mcs.anl.gov
[mailto:nek5000-users-bounces at lists.mcs.anl.gov] On Behalf Of
nek5000-users at lists.mcs.anl.gov
Sent: Saturday, October 19, 2013 12:18 PM
To: nek5000-users at lists.mcs.anl.gov
Subject: [Nek5000-users] MPI-IO in Nek 



Hi Neks 



After reading the following post, the inherent differences b/w .fld and
.f000 files became somehow clear to me. However, I have a few questions: 



https://lists.mcs.anl.gov/mailman/htdig/nek5000-users/2013-June/002147.html 



1) Is this correct that compiling with MPI-IO only requires adding this flag
in makenek: PPLIST="MPIIO" ? or is there anything else I need to do? 

2) I noticed that if you do not add the above flag but set param(66) and
param(67) in usrdat2 to 6 or -6 , it does generate .f000 files. I assume it
implicitly used MPIIO, right? 

3) If I use several directories for outputting multiple files (by say
setting p65=-64 in the .rea file), how can I restart from those multiple
files associated with a single checkpoint in the * .rea * file? 



Thanks a lot for your help 



Best, 

Hesam 




_______________________________________________
Nek5000-users mailing list
Nek5000-users at lists.mcs.anl.gov
https://lists.mcs.anl.gov/mailman/listinfo/nek5000-users
_______________________________________________
Nek5000-users mailing list
Nek5000-users at lists.mcs.anl.gov
https://lists.mcs.anl.gov/mailman/listinfo/nek5000-users


_______________________________________________
Nek5000-users mailing list
Nek5000-users at lists.mcs.anl.gov
https://lists.mcs.anl.gov/mailman/listinfo/nek5000-users


_______________________________________________
Nek5000-users mailing list
Nek5000-users at lists.mcs.anl.gov
https://lists.mcs.anl.gov/mailman/listinfo/nek5000-users
_______________________________________________
Nek5000-users mailing list
Nek5000-users at lists.mcs.anl.gov
https://lists.mcs.anl.gov/mailman/listinfo/nek5000-users


_______________________________________________
Nek5000-users mailing list
Nek5000-users at lists.mcs.anl.gov
https://lists.mcs.anl.gov/mailman/listinfo/nek5000-users


More information about the Nek5000-users mailing list