[Nek5000-users] MPI-IO in Nek

nek5000-users at lists.mcs.anl.gov nek5000-users at lists.mcs.anl.gov
Sun Oct 20 11:35:37 CDT 2013


Hi Hesam,

without knowing the details of your case I can't give you
a firm answer.

My suggestion motivated simply to see if indeed it was a
memory footprint problem.

Note that we have run tens of thousands of cases on hundreds
of platforms and up to a over a million processes; there is
in general no difficulty getting to a particular point in
the computational parameter space.  You do, however, have to
have enough memory per node for the problem at hand.

Paul


----- Original Message -----
From: nek5000-users at lists.mcs.anl.gov
To: nek5000-users at lists.mcs.anl.gov
Sent: Sunday, October 20, 2013 10:57:47 AM
Subject: Re: [Nek5000-users] MPI-IO in Nek

Now my question is, why should we set np*lelt ~ nel, *provided that* the
code runs fine with lelt.gt.(nel/np) and is only killed at the IO-time.
Because if the MPIIO does its job, then every single task is outputting its
own proportion which had been dealing with already in its memory. I am not
sure if I am making any sense here but it would be helpful if you could
comment on that since for really large jobs, one would not prefer to
increase the #tasks (or reduce lelt) only due to the IO problem!

Or maybe there are better ways of reducing memory foot print that I am not
aware of?

Regards,
Hesam


-----Original Message-----
From: nek5000-users-bounces at lists.mcs.anl.gov
[mailto:nek5000-users-bounces at lists.mcs.anl.gov] On Behalf Of
nek5000-users at lists.mcs.anl.gov
Sent: Sunday, October 20, 2013 11:37 AM
To: nek5000-users at lists.mcs.anl.gov
Subject: Re: [Nek5000-users] MPI-IO in Nek

Hi Paul

Further reducing lelt to match np*lelt ~ nel did solve my problem which I
had imagined has to do with parallel IO. Indeed, setting only p66=p67=6 in
usrdat2(), and compiling using MPIIO flag, does provide MPI-IO.

Thank again,
Hesam


-----Original Message-----
From: nek5000-users-bounces at lists.mcs.anl.gov
[mailto:nek5000-users-bounces at lists.mcs.anl.gov] On Behalf Of
nek5000-users at lists.mcs.anl.gov
Sent: Saturday, October 19, 2013 6:02 PM
To: nek5000-users at lists.mcs.anl.gov
Subject: Re: [Nek5000-users] MPI-IO in Nek


Hi Hesam,

Regarding the 8192 tasks case, did you reduce your memory foot print such
that 8192*lelt ~ nel, the number of elements in your case?

I'm pretty certain that you just set p65=1, p66=p67=6 in usrdat2() and all
should work if you've compiled from scratch with the MPI-IO flag.

In principle, the code will figure out from the header that your input has
multiple files, regardless of the value of p65, as should be the case since
the files dictate the input format, not some other flag.

In practice, I know that the preceding statement is true for the non MPI-IO
case, but I'm not 100% certain for the MPI-IO case, which is focussed more
on the single-file approach.  I think however that
it works in this case as well.   If not, let me know --- there are
possible work-arounds.

Paul


----- Original Message -----
From: nek5000-users at lists.mcs.anl.gov
To: nek5000-users at lists.mcs.anl.gov
Sent: Saturday, October 19, 2013 12:49:37 PM
Subject: Re: [Nek5000-users] MPI-IO in Nek





Also, I asked this specifically because my largest job using 8192 tasks was
killed at the first IO-time due to being out of memory although I was using
the .f000 format (either when every processor was outputting or 64 of them)
. Any ideas as to why that might have happened? 



Hesam 





From: nek5000-users-bounces at lists.mcs.anl.gov
[mailto:nek5000-users-bounces at lists.mcs.anl.gov] On Behalf Of
nek5000-users at lists.mcs.anl.gov
Sent: Saturday, October 19, 2013 12:18 PM
To: nek5000-users at lists.mcs.anl.gov
Subject: [Nek5000-users] MPI-IO in Nek 



Hi Neks 



After reading the following post, the inherent differences b/w .fld and
.f000 files became somehow clear to me. However, I have a few questions: 



https://lists.mcs.anl.gov/mailman/htdig/nek5000-users/2013-June/002147.html 



1) Is this correct that compiling with MPI-IO only requires adding this flag
in makenek: PPLIST="MPIIO" ? or is there anything else I need to do? 

2) I noticed that if you do not add the above flag but set param(66) and
param(67) in usrdat2 to 6 or -6 , it does generate .f000 files. I assume it
implicitly used MPIIO, right? 

3) If I use several directories for outputting multiple files (by say
setting p65=-64 in the .rea file), how can I restart from those multiple
files associated with a single checkpoint in the * .rea * file? 



Thanks a lot for your help 



Best, 

Hesam 




_______________________________________________
Nek5000-users mailing list
Nek5000-users at lists.mcs.anl.gov
https://lists.mcs.anl.gov/mailman/listinfo/nek5000-users
_______________________________________________
Nek5000-users mailing list
Nek5000-users at lists.mcs.anl.gov
https://lists.mcs.anl.gov/mailman/listinfo/nek5000-users


_______________________________________________
Nek5000-users mailing list
Nek5000-users at lists.mcs.anl.gov
https://lists.mcs.anl.gov/mailman/listinfo/nek5000-users


_______________________________________________
Nek5000-users mailing list
Nek5000-users at lists.mcs.anl.gov
https://lists.mcs.anl.gov/mailman/listinfo/nek5000-users


More information about the Nek5000-users mailing list