collective write with 1 dimension being global

Thu Mar 17 21:39:13 CDT 2011

Hi,

The attached program seems to do what I want.  I am currently traveling back to KAUST from the US -arriving Saturday morning. I retest the write procedure on our BG/p on Saturday/Sunday.

Thanks for the quick response -it's greatly appreciated.

Best Regards,
Mark

Sent from my iPhone

On Mar 17, 2011, at 3:55 PM, "Wei-keng Liao" <wkliao at ece.northwestern.edu> wrote:

> Hi, Mark,
> 
> Based on your I/O description, I wrote this simple program.
> The first half creates a file, write a 4D array, and closes the file.
> The second part opens the file and read it back using the same partitioning setting.
> Please let us know if this is similar to your I/O requests.
> I tested it and data seems OK.
> 
> <4d.f90>
> 
> 
> Wei-keng
> 
> On Mar 17, 2011, at 4:22 PM, Rob Latham wrote:
> 
>> ok, i'm having a hard time mentally visualizing 4d, so let me make
>> sure I have a good understanding of the 3d version of this problem:
>> 
>> - Face-wise decomposition should work fine
>> - Splitting up the big 3d cube into N smaller cubes should work fine
>> (at least, that's a workload we've seen many times: there would be a
>> lot of bug reports if it did not)
>> 
>> - The problem, though is when one dimension is the same for all
>> processors.  in 3d space, that would mean... that all the sub-cubes end
>> up jammed against one face?  
>> 
>> If there's an (offset, count) tuple that's the same for every process,
>> then I guess that means the decomposition overlaps.  For writes,
>> overlapping decompositions result in undefined behavior.  For reads,
>> overlapping decompositions should just get sorted out in the MPI-IO
>> layer. 
>> 
>> If that's the crux of your problem, I can verify with a test case.
>> Let me know if I understand your application correctly.
>> 
>> ==rob
>> 
>> On Thu, Mar 10, 2011 at 05:08:39PM +0300, Nicholas K Allsopp wrote:
>>> Hi Rob,
>>> 
>>> Below is the section of code which Mark is describing.
>>> 
>>> Thanks
>>> Nick
>>> 
>>>     use param, only: f_now, nv
>>>     use comms, only: die
>>>     implicit none
>>> 
>>>     integer :: status, ncid, varID
>>>     integer(kind=MPI_OFFSET_KIND) :: count(4), offset(4), tmp(1)
>>>     real(kind=8) :: tmp2(1)
>>>     real(kind=8), dimension(:,:,:,:), allocatable :: val
>>>     logical :: here=.false.
>>> 
>>>     status = nfmpi_open( cart_comm, "restart.nc", nf_nowrite, &
>>>                          MPI_INFO_NULL, ncid )
>>> 
>>>     status = nfmpi_inq_dimlen( ncid, 1, tmp(1) )
>>> 
>>>   ! Read in the initial model time
>>>   !------------------------------------------------------------------
>>>     status = nfmpi_get_att_double( ncid, nf_global, "Model_Time", &
>>>                                    tmp2(1) )
>>>     model_time = tmp2(1)
>>> 
>>>   ! Read in the initial ion distribution field
>>>   !------------------------------------------------------------------
>>>     count = (/nx_local,ny_local,nz_local,nv/)
>>>     offset(1) = global_start(1)
>>>     offset(2) = global_start(2)
>>>     offset(3) = global_start(3)
>>>     offset(4) = 1
>>> 
>>>     allocate( val(nx_local,ny_local,nz_local,nv) )
>>> 
>>>     status = nfmpi_inq_varid( ncid, "Ion_Distribution", varID )
>>>     status = nfmpi_get_vara_double_all( ncid, varID, offset, count, val )
>>>     f_now = 0.0d0
>>>     f_now( 1:nx_local,1:ny_local,1:nz_local,1:nv ) = val
>>>     deallocate( val )
>>> 
>>>     status = nfmpi_close( ncid )
>>>     return
>>> 
>>> 
>>> 
>>> On 3/10/11 5:00 PM, "Mark P Cheeseman" <mark.cheeseman at kaust.edu.sa> wrote:
>>> 
>>>> Hi Nick,
>>>> 
>>>> Could you please make a code snippet from the read_restart subroutine
>>>> in io.f90 source file for Rob? I do not have access to the KSL_Drift
>>>> source currently (I do not bring my laptop to purposely keep me from
>>>> doing work).
>>>> 
>>>> Thanks,
>>>> Mark
>>>> 
>>>> 
>>>> 
>>>> ---------- Forwarded message ----------
>>>> From: Rob Latham <robl at mcs.anl.gov>
>>>> Date: Wednesday, March 9, 2011
>>>> Subject: collective write with 1 dimension being global
>>>> To: Mark Cheeseman <mark.cheeseman at kaust.edu.sa>
>>>> Cc: parallel-netcdf at mcs.anl.gov
>>>> 
>>>> 
>>>> On Sun, Mar 06, 2011 at 01:47:27PM +0300, Mark Cheeseman wrote:
>>>>> Hello,
>>>>> 
>>>>> I have a 4D variable inside a NetCDF file that I wish to distribute over a
>>>>> number of MPI tasks.  The variable will be decomposed over the first 3
>>>>> dimensions but not the fouth (i.e. the fourth dimension is kept global for
>>>>> all MPI tasks). In other words:
>>>>> 
>>>>>              GLOBAL_FIELD[nx,ny,nz,nv]  ==>
>>>>> LOCAL_FIELD[nx_local,ny_local,nz_local,nv]
>>>>> 
>>>>> I am trying to achieve via a nfmpi_get_vara_double_all call but the data
>>>>> keeps getting corrupted.  I am sure that my offsets and local domain sizes
>>>>> are correct.  If I modify my code to read only a single 3D slice (i.e. along
>>>>> 1 point in the fourth dimension), the code and input data are correct.
>>>>> 
>>>>> Can parallel-netcdf handle a local dimension being equal to a global
>>>>> dimension?  Or should I be using another call?
>>>> 
>>>> Hi: sorry for the delay.  Several of us are on travel this week.
>>>> 
>>>> I think what you are trying to do is legal.
>>>> 
>>>> Do you have a test case you could share?  Does writing exhibit the
>>>> same bug?  Does the C interface (either reading or writing)?
>>>> 
>>>> ==rob
>>>> 
>>>> --
>>>> Rob Latham
>>>> Mathematics and Computer Science Division
>>>> Argonne National Lab, IL USA
>>>> 
>>>> 
>>> 
>> 
>> -- 
>> Rob Latham
>> Mathematics and Computer Science Division
>> Argonne National Lab, IL USA
>> 
>