collective write with 1 dimension being global

Thu Mar 17 16:22:49 CDT 2011

ok, i'm having a hard time mentally visualizing 4d, so let me make
sure I have a good understanding of the 3d version of this problem:

- Face-wise decomposition should work fine
- Splitting up the big 3d cube into N smaller cubes should work fine
  (at least, that's a workload we've seen many times: there would be a
  lot of bug reports if it did not)

- The problem, though is when one dimension is the same for all
  processors.  in 3d space, that would mean... that all the sub-cubes end
  up jammed against one face?  

If there's an (offset, count) tuple that's the same for every process,
then I guess that means the decomposition overlaps.  For writes,
overlapping decompositions result in undefined behavior.  For reads,
overlapping decompositions should just get sorted out in the MPI-IO
layer. 

If that's the crux of your problem, I can verify with a test case.
Let me know if I understand your application correctly.

==rob

On Thu, Mar 10, 2011 at 05:08:39PM +0300, Nicholas K Allsopp wrote:
> Hi Rob,
> 
> Below is the section of code which Mark is describing.
> 
> Thanks
> Nick
> 
>       use param, only: f_now, nv
>       use comms, only: die
>       implicit none
> 
>       integer :: status, ncid, varID
>       integer(kind=MPI_OFFSET_KIND) :: count(4), offset(4), tmp(1)
>       real(kind=8) :: tmp2(1)
>       real(kind=8), dimension(:,:,:,:), allocatable :: val
>       logical :: here=.false.
> 
>       status = nfmpi_open( cart_comm, "restart.nc", nf_nowrite, &
>                            MPI_INFO_NULL, ncid )
> 
>       status = nfmpi_inq_dimlen( ncid, 1, tmp(1) )
> 
>     ! Read in the initial model time
>     !------------------------------------------------------------------
>       status = nfmpi_get_att_double( ncid, nf_global, "Model_Time", &
>                                      tmp2(1) )
>       model_time = tmp2(1)
> 
>     ! Read in the initial ion distribution field
>     !------------------------------------------------------------------
>       count = (/nx_local,ny_local,nz_local,nv/)
>       offset(1) = global_start(1)
>       offset(2) = global_start(2)
>       offset(3) = global_start(3)
>       offset(4) = 1
> 
>       allocate( val(nx_local,ny_local,nz_local,nv) )
> 
>       status = nfmpi_inq_varid( ncid, "Ion_Distribution", varID )
>       status = nfmpi_get_vara_double_all( ncid, varID, offset, count, val )
>       f_now = 0.0d0
>       f_now( 1:nx_local,1:ny_local,1:nz_local,1:nv ) = val
>       deallocate( val )
> 
>       status = nfmpi_close( ncid )
>       return
> 
> 
> 
> On 3/10/11 5:00 PM, "Mark P Cheeseman" <mark.cheeseman at kaust.edu.sa> wrote:
> 
> > Hi Nick,
> > 
> > Could you please make a code snippet from the read_restart subroutine
> > in io.f90 source file for Rob? I do not have access to the KSL_Drift
> > source currently (I do not bring my laptop to purposely keep me from
> > doing work).
> > 
> > Thanks,
> > Mark
> > 
> > 
> > 
> > ---------- Forwarded message ----------
> > From: Rob Latham <robl at mcs.anl.gov>
> > Date: Wednesday, March 9, 2011
> > Subject: collective write with 1 dimension being global
> > To: Mark Cheeseman <mark.cheeseman at kaust.edu.sa>
> > Cc: parallel-netcdf at mcs.anl.gov
> > 
> > 
> > On Sun, Mar 06, 2011 at 01:47:27PM +0300, Mark Cheeseman wrote:
> >> Hello,
> >> 
> >> I have a 4D variable inside a NetCDF file that I wish to distribute over a
> >> number of MPI tasks.  The variable will be decomposed over the first 3
> >> dimensions but not the fouth (i.e. the fourth dimension is kept global for
> >> all MPI tasks). In other words:
> >> 
> >>               GLOBAL_FIELD[nx,ny,nz,nv]  ==>
> >> LOCAL_FIELD[nx_local,ny_local,nz_local,nv]
> >> 
> >> I am trying to achieve via a nfmpi_get_vara_double_all call but the data
> >> keeps getting corrupted.  I am sure that my offsets and local domain sizes
> >> are correct.  If I modify my code to read only a single 3D slice (i.e. along
> >> 1 point in the fourth dimension), the code and input data are correct.
> >> 
> >> Can parallel-netcdf handle a local dimension being equal to a global
> >> dimension?  Or should I be using another call?
> > 
> > Hi: sorry for the delay.  Several of us are on travel this week.
> > 
> > I think what you are trying to do is legal.
> > 
> > Do you have a test case you could share?  Does writing exhibit the
> > same bug?  Does the C interface (either reading or writing)?
> > 
> > ==rob
> > 
> > --
> > Rob Latham
> > Mathematics and Computer Science Division
> > Argonne National Lab, IL USA
> > 
> > 
> 

-- 
Rob Latham
Mathematics and Computer Science Division
Argonne National Lab, IL USA