collective write with 1 dimension being global
Rob Latham
robl at mcs.anl.gov
Thu Mar 17 16:22:49 CDT 2011
ok, i'm having a hard time mentally visualizing 4d, so let me make
sure I have a good understanding of the 3d version of this problem:
- Face-wise decomposition should work fine
- Splitting up the big 3d cube into N smaller cubes should work fine
(at least, that's a workload we've seen many times: there would be a
lot of bug reports if it did not)
- The problem, though is when one dimension is the same for all
processors. in 3d space, that would mean... that all the sub-cubes end
up jammed against one face?
If there's an (offset, count) tuple that's the same for every process,
then I guess that means the decomposition overlaps. For writes,
overlapping decompositions result in undefined behavior. For reads,
overlapping decompositions should just get sorted out in the MPI-IO
layer.
If that's the crux of your problem, I can verify with a test case.
Let me know if I understand your application correctly.
==rob
On Thu, Mar 10, 2011 at 05:08:39PM +0300, Nicholas K Allsopp wrote:
> Hi Rob,
>
> Below is the section of code which Mark is describing.
>
> Thanks
> Nick
>
> use param, only: f_now, nv
> use comms, only: die
> implicit none
>
> integer :: status, ncid, varID
> integer(kind=MPI_OFFSET_KIND) :: count(4), offset(4), tmp(1)
> real(kind=8) :: tmp2(1)
> real(kind=8), dimension(:,:,:,:), allocatable :: val
> logical :: here=.false.
>
> status = nfmpi_open( cart_comm, "restart.nc", nf_nowrite, &
> MPI_INFO_NULL, ncid )
>
> status = nfmpi_inq_dimlen( ncid, 1, tmp(1) )
>
> ! Read in the initial model time
> !------------------------------------------------------------------
> status = nfmpi_get_att_double( ncid, nf_global, "Model_Time", &
> tmp2(1) )
> model_time = tmp2(1)
>
> ! Read in the initial ion distribution field
> !------------------------------------------------------------------
> count = (/nx_local,ny_local,nz_local,nv/)
> offset(1) = global_start(1)
> offset(2) = global_start(2)
> offset(3) = global_start(3)
> offset(4) = 1
>
> allocate( val(nx_local,ny_local,nz_local,nv) )
>
> status = nfmpi_inq_varid( ncid, "Ion_Distribution", varID )
> status = nfmpi_get_vara_double_all( ncid, varID, offset, count, val )
> f_now = 0.0d0
> f_now( 1:nx_local,1:ny_local,1:nz_local,1:nv ) = val
> deallocate( val )
>
> status = nfmpi_close( ncid )
> return
>
>
>
> On 3/10/11 5:00 PM, "Mark P Cheeseman" <mark.cheeseman at kaust.edu.sa> wrote:
>
> > Hi Nick,
> >
> > Could you please make a code snippet from the read_restart subroutine
> > in io.f90 source file for Rob? I do not have access to the KSL_Drift
> > source currently (I do not bring my laptop to purposely keep me from
> > doing work).
> >
> > Thanks,
> > Mark
> >
> >
> >
> > ---------- Forwarded message ----------
> > From: Rob Latham <robl at mcs.anl.gov>
> > Date: Wednesday, March 9, 2011
> > Subject: collective write with 1 dimension being global
> > To: Mark Cheeseman <mark.cheeseman at kaust.edu.sa>
> > Cc: parallel-netcdf at mcs.anl.gov
> >
> >
> > On Sun, Mar 06, 2011 at 01:47:27PM +0300, Mark Cheeseman wrote:
> >> Hello,
> >>
> >> I have a 4D variable inside a NetCDF file that I wish to distribute over a
> >> number of MPI tasks. The variable will be decomposed over the first 3
> >> dimensions but not the fouth (i.e. the fourth dimension is kept global for
> >> all MPI tasks). In other words:
> >>
> >> GLOBAL_FIELD[nx,ny,nz,nv] ==>
> >> LOCAL_FIELD[nx_local,ny_local,nz_local,nv]
> >>
> >> I am trying to achieve via a nfmpi_get_vara_double_all call but the data
> >> keeps getting corrupted. I am sure that my offsets and local domain sizes
> >> are correct. If I modify my code to read only a single 3D slice (i.e. along
> >> 1 point in the fourth dimension), the code and input data are correct.
> >>
> >> Can parallel-netcdf handle a local dimension being equal to a global
> >> dimension? Or should I be using another call?
> >
> > Hi: sorry for the delay. Several of us are on travel this week.
> >
> > I think what you are trying to do is legal.
> >
> > Do you have a test case you could share? Does writing exhibit the
> > same bug? Does the C interface (either reading or writing)?
> >
> > ==rob
> >
> > --
> > Rob Latham
> > Mathematics and Computer Science Division
> > Argonne National Lab, IL USA
> >
> >
>
--
Rob Latham
Mathematics and Computer Science Division
Argonne National Lab, IL USA
More information about the parallel-netcdf
mailing list