[petsc-users] Problem with DMDAVecGetArrayF90 and DMDAVecRestoreArrayF90
TAY wee-beng
zonexo at gmail.com
Sat Apr 19 09:14:56 CDT 2014
On 19/4/2014 6:48 PM, Matthew Knepley wrote:
> On Sat, Apr 19, 2014 at 4:59 AM, TAY wee-beng <zonexo at gmail.com
> <mailto:zonexo at gmail.com>> wrote:
>
> On 19/4/2014 1:17 PM, Barry Smith wrote:
>
> On Apr 19, 2014, at 12:11 AM, TAY wee-beng <zonexo at gmail.com
> <mailto:zonexo at gmail.com>> wrote:
>
> On 19/4/2014 12:10 PM, Barry Smith wrote:
>
> On Apr 18, 2014, at 9:57 PM, TAY wee-beng
> <zonexo at gmail.com <mailto:zonexo at gmail.com>> wrote:
>
> On 19/4/2014 3:53 AM, Barry Smith wrote:
>
> Hmm,
>
> Interface DMDAVecGetArrayF90
> Subroutine DMDAVecGetArrayF903(da1,
> v,d1,ierr)
> USE_DM_HIDE
> DM_HIDE da1
> VEC_HIDE v
> PetscScalar,pointer :: d1(:,:,:)
> PetscErrorCode ierr
> End Subroutine
>
> So the d1 is a F90 POINTER. But your
> subroutine seems to be treating it as a “plain
> old Fortran array”?
> real(8), intent(inout) ::
> u(:,:,:),v(:,:,:),w(:,:,:)
>
> Hi,
>
> So d1 is a pointer, and it's different if I declare it as
> "plain old Fortran array"? Because I declare it as a
> Fortran array and it works w/o any problem if I only call
> DMDAVecGetArrayF90 and DMDAVecRestoreArrayF90 with "u".
>
> But if I call DMDAVecGetArrayF90 and
> DMDAVecRestoreArrayF90 with "u", "v" and "w", error starts
> to happen. I wonder why...
>
> Also, supposed I call:
>
> call DMDAVecGetArrayF90(da_u,u_local,u_array,ierr)
>
> call DMDAVecGetArrayF90(da_v,v_local,v_array,ierr)
>
> call DMDAVecGetArrayF90(da_w,w_local,w_array,ierr)
>
> u_array ....
>
> v_array .... etc
>
> Now to restore the array, does it matter the sequence they
> are restored?
>
> No it should not matter. If it matters that is a sign that
> memory has been written to incorrectly earlier in the code.
>
> Hi,
>
> Hmm, I have been getting different results on different intel
> compilers. I'm not sure if MPI played a part but I'm only using a
> single processor. In the debug mode, things run without problem.
> In optimized mode, in some cases, the code aborts even doing
> simple initialization:
>
>
> call DMDAVecGetArrayF90(da_u,u_local,u_array,ierr)
>
> call DMDAVecGetArrayF90(da_v,v_local,v_array,ierr)
>
> call DMDAVecGetArrayF90(da_w,w_local,w_array,ierr)
>
> call DMDAVecGetArrayF90(da_p,p_local,p_array,ierr)
>
> u_array = 0.d0
>
> v_array = 0.d0
>
> w_array = 0.d0
>
> p_array = 0.d0
>
>
> call DMDAVecRestoreArrayF90(da_p,p_local,p_array,ierr)
>
>
> call DMDAVecRestoreArrayF90(da_w,w_local,w_array,ierr)
>
> call DMDAVecRestoreArrayF90(da_v,v_local,v_array,ierr)
>
> call DMDAVecRestoreArrayF90(da_u,u_local,u_array,ierr)
>
> The code aborts at call
> DMDAVecRestoreArrayF90(da_w,w_local,w_array,ierr), giving
> segmentation error. But other version of intel compiler passes
> thru this part w/o error. Since the response is different among
> different compilers, is this PETSc or intel 's bug? Or mvapich or
> openmpi?
>
>
> We do this is a bunch of examples. Can you reproduce this different
> behavior in src/dm/examples/tutorials/ex11f90.F?
Hi Matt,
Do you mean putting the above lines into ex11f90.F and test?
Thanks
Regards.
>
> Matt
>
> As in w, then v and u?
>
> call DMDAVecRestoreArrayF90(da_w,w_local,w_array,ierr)
> call DMDAVecRestoreArrayF90(da_v,v_local,v_array,ierr)
> call DMDAVecRestoreArrayF90(da_u,u_local,u_array,ierr)
>
> thanks
>
> Note also that the beginning and end
> indices of the u,v,w, are different for each
> process see for example
> http://www.mcs.anl.gov/petsc/petsc-3.4/src/dm/examples/tutorials/ex11f90.F
> (and they do not start at 1). This is how to
> get the loop bounds.
>
> Hi,
>
> In my case, I fixed the u,v,w such that their
> indices are the same. I also checked using
> DMDAGetCorners and DMDAGetGhostCorners. Now the
> problem lies in my subroutine treating it as a
> “plain old Fortran array”.
>
> If I declare them as pointers, their indices
> follow the C 0 start convention, is that so?
>
> Not really. It is that in each process you need to
> access them from the indices indicated by
> DMDAGetCorners() for global vectors and
> DMDAGetGhostCorners() for local vectors. So really C
> or Fortran doesn’t make any difference.
>
>
> So my problem now is that in my old MPI code, the
> u(i,j,k) follow the Fortran 1 start convention. Is
> there some way to manipulate such that I do not
> have to change my u(i,j,k) to u(i-1,j-1,k-1)?
>
> If you code wishes to access them with indices plus
> one from the values returned by DMDAGetCorners() for
> global vectors and DMDAGetGhostCorners() for local
> vectors then you need to manually subtract off the 1.
>
> Barry
>
> Thanks.
>
> Barry
>
> On Apr 18, 2014, at 10:58 AM, TAY wee-beng
> <zonexo at gmail.com <mailto:zonexo at gmail.com>>
> wrote:
>
> Hi,
>
> I tried to pinpoint the problem. I reduced
> my job size and hence I can run on 1
> processor. Tried using valgrind but
> perhaps I'm using the optimized version,
> it didn't catch the error, besides saying
> "Segmentation fault (core dumped)"
>
> However, by re-writing my code, I found
> out a few things:
>
> 1. if I write my code this way:
>
> call
> DMDAVecGetArrayF90(da_u,u_local,u_array,ierr)
>
> call
> DMDAVecGetArrayF90(da_v,v_local,v_array,ierr)
>
> call
> DMDAVecGetArrayF90(da_w,w_local,w_array,ierr)
>
> u_array = ....
>
> v_array = ....
>
> w_array = ....
>
> call
> DMDAVecRestoreArrayF90(da_w,w_local,w_array,ierr)
>
> call
> DMDAVecRestoreArrayF90(da_v,v_local,v_array,ierr)
>
> call
> DMDAVecRestoreArrayF90(da_u,u_local,u_array,ierr)
>
> The code runs fine.
>
> 2. if I write my code this way:
>
> call
> DMDAVecGetArrayF90(da_u,u_local,u_array,ierr)
>
> call
> DMDAVecGetArrayF90(da_v,v_local,v_array,ierr)
>
> call
> DMDAVecGetArrayF90(da_w,w_local,w_array,ierr)
>
> call
> uvw_array_change(u_array,v_array,w_array)
> -> this subroutine does the same
> modification as the above.
>
> call
> DMDAVecRestoreArrayF90(da_w,w_local,w_array,ierr)
>
> call
> DMDAVecRestoreArrayF90(da_v,v_local,v_array,ierr)
>
> call
> DMDAVecRestoreArrayF90(da_u,u_local,u_array,ierr)
> -> error
>
> where the subroutine is:
>
> subroutine uvw_array_change(u,v,w)
>
> real(8), intent(inout) ::
> u(:,:,:),v(:,:,:),w(:,:,:)
>
> u ...
> v...
> w ...
>
> end subroutine uvw_array_change.
>
> The above will give an error at :
>
> call
> DMDAVecRestoreArrayF90(da_u,u_local,u_array,ierr)
>
> 3. Same as above, except I change the
> order of the last 3 lines to:
>
> call
> DMDAVecRestoreArrayF90(da_u,u_local,u_array,ierr)
>
> call
> DMDAVecRestoreArrayF90(da_v,v_local,v_array,ierr)
>
> call
> DMDAVecRestoreArrayF90(da_w,w_local,w_array,ierr)
>
> So they are now in reversed order. Now it
> works.
>
> 4. Same as 2 or 3, except the subroutine
> is changed to :
>
> subroutine uvw_array_change(u,v,w)
>
> real(8), intent(inout) ::
> u(start_indices(1):end_indices(1),start_indices(2):end_indices(2),start_indices(3):end_indices(3))
>
> real(8), intent(inout) ::
> v(start_indices(1):end_indices(1),start_indices(2):end_indices(2),start_indices(3):end_indices(3))
>
> real(8), intent(inout) ::
> w(start_indices(1):end_indices(1),start_indices(2):end_indices(2),start_indices(3):end_indices(3))
>
> u ...
> v...
> w ...
>
> end subroutine uvw_array_change.
>
> The start_indices and end_indices are
> simply to shift the 0 indices of C
> convention to that of the 1 indices of the
> Fortran convention. This is necessary in
> my case because most of my codes start
> array counting at 1, hence the "trick".
>
> However, now no matter which order of the
> DMDAVecRestoreArrayF90 (as in 2 or 3),
> error will occur at "call
> DMDAVecRestoreArrayF90(da_v,v_local,v_array,ierr)
> "
>
> So did I violate and cause memory
> corruption due to the trick above? But I
> can't think of any way other than the
> "trick" to continue using the 1 indices
> convention.
>
> Thank you.
>
> Yours sincerely,
>
> TAY wee-beng
>
> On 15/4/2014 8:00 PM, Barry Smith wrote:
>
> Try running under valgrind
> http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind
>
>
> On Apr 14, 2014, at 9:47 PM, TAY
> wee-beng <zonexo at gmail.com
> <mailto:zonexo at gmail.com>> wrote:
>
> Hi Barry,
>
> As I mentioned earlier, the code
> works fine in PETSc debug mode but
> fails in non-debug mode.
>
> I have attached my code.
>
> Thank you
>
> Yours sincerely,
>
> TAY wee-beng
>
> On 15/4/2014 2:26 AM, Barry Smith
> wrote:
>
> Please send the code that
> creates da_w and the
> declarations of w_array
>
> Barry
>
> On Apr 14, 2014, at 9:40 AM,
> TAY wee-beng
> <zonexo at gmail.com
> <mailto:zonexo at gmail.com>>
> wrote:
>
>
> Hi Barry,
>
> I'm not too sure how to do
> it. I'm running mpi. So I run:
>
> mpirun -n 4 ./a.out
> -start_in_debugger
>
> I got the msg below.
> Before the gdb windows
> appear (thru x11), the
> program aborts.
>
> Also I tried running in
> another cluster and it
> worked. Also tried in the
> current cluster in debug
> mode and it worked too.
>
> mpirun -n 4 ./a.out
> -start_in_debugger
> --------------------------------------------------------------------------
> An MPI process has
> executed an operation
> involving a call to the
> "fork()" system call to
> create a child process.
> Open MPI is currently
> operating in a condition
> that could result in
> memory corruption or
> other system errors; your
> MPI job may hang, crash,
> or produce silent
> data corruption. The use
> of fork() (or system() or
> other calls that
> create child processes) is
> strongly discouraged.
>
> The process that invoked
> fork was:
>
> Local host:
> n12-76 (PID 20235)
> MPI_COMM_WORLD rank: 2
>
> If you are *absolutely
> sure* that your
> application will successfully
> and correctly survive a
> call to fork(), you may
> disable this warning
> by setting the
> mpi_warn_on_fork MCA
> parameter to 0.
> --------------------------------------------------------------------------
> [2]PETSC ERROR: PETSC:
> Attaching gdb to ./a.out
> of pid 20235 on display
> localhost:50.0 on machine
> n12-76
> [0]PETSC ERROR: PETSC:
> Attaching gdb to ./a.out
> of pid 20233 on display
> localhost:50.0 on machine
> n12-76
> [1]PETSC ERROR: PETSC:
> Attaching gdb to ./a.out
> of pid 20234 on display
> localhost:50.0 on machine
> n12-76
> [3]PETSC ERROR: PETSC:
> Attaching gdb to ./a.out
> of pid 20236 on display
> localhost:50.0 on machine
> n12-76
> [n12-76:20232] 3 more
> processes have sent help
> message
> help-mpi-runtime.txt /
> mpi_init:warn-fork
> [n12-76:20232] Set MCA
> parameter
> "orte_base_help_aggregate"
> to 0 to see all help /
> error messages
>
> ....
>
> 1
> [1]PETSC ERROR:
> ------------------------------------------------------------------------
> [1]PETSC ERROR: Caught
> signal number 11 SEGV:
> Segmentation Violation,
> probably memory access out
> of range
> [1]PETSC ERROR: Try option
> -start_in_debugger or
> -on_error_attach_debugger
> [1]PETSC ERROR: or see
> http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind[1]PETSC
> ERROR: or try
> http://valgrind.org
> on GNU/linux and Apple
> Mac OS X to find memory
> corruption errors
> [1]PETSC ERROR: configure
> using
> --with-debugging=yes,
> recompile, link, and run
> [1]PETSC ERROR: to get
> more information on the crash.
> [1]PETSC ERROR: User
> provided function() line 0
> in unknown directory
> unknown file (null)
> [3]PETSC ERROR:
> ------------------------------------------------------------------------
> [3]PETSC ERROR: Caught
> signal number 11 SEGV:
> Segmentation Violation,
> probably memory access out
> of range
> [3]PETSC ERROR: Try option
> -start_in_debugger or
> -on_error_attach_debugger
> [3]PETSC ERROR: or see
> http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind[3]PETSC
> ERROR: or try
> http://valgrind.org
> on GNU/linux and Apple
> Mac OS X to find memory
> corruption errors
> [3]PETSC ERROR: configure
> using
> --with-debugging=yes,
> recompile, link, and run
> [3]PETSC ERROR: to get
> more information on the crash.
> [3]PETSC ERROR: User
> provided function() line 0
> in unknown directory
> unknown file (null)
>
> ...
> Thank you.
>
> Yours sincerely,
>
> TAY wee-beng
>
> On 14/4/2014 9:05 PM,
> Barry Smith wrote:
>
> Because IO doesn’t
> always get flushed
> immediately it may not
> be hanging at this
> point. It is better
> to use the option
> -start_in_debugger
> then type cont in each
> debugger window and
> then when you think it
> is “hanging” do a
> control C in each
> debugger window and
> type where to see
> where each process is
> you can also look
> around in the debugger
> at variables to see
> why it is “hanging” at
> that point.
>
> Barry
>
> This routines don’t
> have any parallel
> communication in them
> so are unlikely to hang.
>
> On Apr 14, 2014, at
> 6:52 AM, TAY wee-beng
>
> <zonexo at gmail.com
> <mailto:zonexo at gmail.com>>
>
> wrote:
>
>
>
> Hi,
>
> My code hangs and
> I added in
> mpi_barrier and
> print to catch the
> bug. I found that
> it hangs after
> printing "7". Is
> it because I'm
> doing something
> wrong? I need to
> access the u,v,w
> array so I use
> DMDAVecGetArrayF90. After
> access, I use
> DMDAVecRestoreArrayF90.
>
> call
> DMDAVecGetArrayF90(da_u,u_local,u_array,ierr)
> call
> MPI_Barrier(MPI_COMM_WORLD,ierr);
> if (myid==0)
> print *,"3"
> call
> DMDAVecGetArrayF90(da_v,v_local,v_array,ierr)
> call
> MPI_Barrier(MPI_COMM_WORLD,ierr);
> if (myid==0)
> print *,"4"
> call
> DMDAVecGetArrayF90(da_w,w_local,w_array,ierr)
> call
> MPI_Barrier(MPI_COMM_WORLD,ierr);
> if (myid==0)
> print *,"5"
> call
> I_IIB_uv_initial_1st_dm(I_cell_no_u1,I_cell_no_v1,I_cell_no_w1,I_cell_u1,I_cell_v1,I_cell_w1,u_array,v_array,w_array)
> call
> MPI_Barrier(MPI_COMM_WORLD,ierr);
> if (myid==0)
> print *,"6"
> call
> DMDAVecRestoreArrayF90(da_w,w_local,w_array,ierr)
> !must be in
> reverse order
> call
> MPI_Barrier(MPI_COMM_WORLD,ierr);
> if (myid==0)
> print *,"7"
> call
> DMDAVecRestoreArrayF90(da_v,v_local,v_array,ierr)
> call
> MPI_Barrier(MPI_COMM_WORLD,ierr);
> if (myid==0)
> print *,"8"
> call
> DMDAVecRestoreArrayF90(da_u,u_local,u_array,ierr)
> --
> Thank you.
>
> Yours sincerely,
>
> TAY wee-beng
>
>
>
> <code.txt>
>
>
>
>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which
> their experiments lead.
> -- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20140419/6e974cf8/attachment-0001.html>
More information about the petsc-users
mailing list