[petsc-users] Problem with DMDAVecGetArrayF90 and DMDAVecRestoreArrayF90
TAY wee-beng
zonexo at gmail.com
Sun May 18 22:28:13 CDT 2014
On 19/5/2014 9:53 AM, Matthew Knepley wrote:
> On Sun, May 18, 2014 at 8:18 PM, TAY wee-beng <zonexo at gmail.com
> <mailto:zonexo at gmail.com>> wrote:
>
> Hi Barry,
>
> I am trying to sort out the details so that it's easier to
> pinpoint the error. However, I tried on gnu gfortran and it worked
> well. On intel ifort, it stopped at one of the
> "DMDAVecGetArrayF90". Does it definitely mean that it's a bug in
> ifort? Do you work with both intel and gnu?
>
>
> Yes it works with Intel. Is this using optimization?
Hi Matt,
I forgot to add that in non-optimized cases, it works with gnu and
intel. However, in optimized cases, it works with gnu, but not intel.
Does it definitely mean that it's a bug in ifort?
>
> Matt
>
>
> Thank you
>
> Yours sincerely,
>
> TAY wee-beng
>
> On 14/5/2014 12:03 AM, Barry Smith wrote:
>
> Please send you current code. So we may compile and run it.
>
> Barry
>
>
> On May 12, 2014, at 9:52 PM, TAY wee-beng <zonexo at gmail.com
> <mailto:zonexo at gmail.com>> wrote:
>
> Hi,
>
> I have sent the entire code a while ago. Is there any
> answer? I was also trying myself but it worked for some
> intel compiler, and some not. I'm still not able to find
> the answer. gnu compilers for most cluster are old
> versions so they are not able to compile since I have
> allocatable structures.
>
> Thank you.
>
> Yours sincerely,
>
> TAY wee-beng
>
> On 21/4/2014 8:58 AM, Barry Smith wrote:
>
> Please send the entire code. If we can run it and
> reproduce the problem we can likely track down the
> issue much faster than through endless rounds of email.
>
> Barry
>
> On Apr 20, 2014, at 7:49 PM, TAY wee-beng
> <zonexo at gmail.com <mailto:zonexo at gmail.com>> wrote:
>
> On 20/4/2014 8:39 AM, TAY wee-beng wrote:
>
> On 20/4/2014 1:02 AM, Matthew Knepley wrote:
>
> On Sat, Apr 19, 2014 at 10:49 AM, TAY
> wee-beng <zonexo at gmail.com
> <mailto:zonexo at gmail.com>> wrote:
> On 19/4/2014 11:39 PM, Matthew Knepley wrote:
>
> On Sat, Apr 19, 2014 at 10:16 AM, TAY
> wee-beng <zonexo at gmail.com
> <mailto:zonexo at gmail.com>> wrote:
> On 19/4/2014 10:55 PM, Matthew Knepley
> wrote:
>
> On Sat, Apr 19, 2014 at 9:14 AM,
> TAY wee-beng <zonexo at gmail.com
> <mailto:zonexo at gmail.com>> wrote:
> On 19/4/2014 6:48 PM, Matthew
> Knepley wrote:
>
> On Sat, Apr 19, 2014 at 4:59
> AM, TAY wee-beng
> <zonexo at gmail.com
> <mailto:zonexo at gmail.com>> wrote:
> On 19/4/2014 1:17 PM, Barry
> Smith wrote:
> On Apr 19, 2014, at 12:11 AM,
> TAY wee-beng <zonexo at gmail.com
> <mailto:zonexo at gmail.com>> wrote:
>
> On 19/4/2014 12:10 PM, Barry
> Smith wrote:
> On Apr 18, 2014, at 9:57 PM,
> TAY wee-beng <zonexo at gmail.com
> <mailto:zonexo at gmail.com>> wrote:
>
> On 19/4/2014 3:53 AM, Barry
> Smith wrote:
> Hmm,
>
> Interface
> DMDAVecGetArrayF90
> Subroutine
> DMDAVecGetArrayF903(da1,
> v,d1,ierr)
> USE_DM_HIDE
> DM_HIDE da1
> VEC_HIDE v
>
> PetscScalar,pointer :: d1(:,:,:)
> PetscErrorCode ierr
> End Subroutine
>
> So the d1 is a F90
> POINTER. But your subroutine
> seems to be treating it as a
> “plain old Fortran array”?
> real(8), intent(inout) ::
> u(:,:,:),v(:,:,:),w(:,:,:)
> Hi,
>
> So d1 is a pointer, and it's
> different if I declare it as
> "plain old Fortran array"?
> Because I declare it as a
> Fortran array and it works w/o
> any problem if I only call
> DMDAVecGetArrayF90 and
> DMDAVecRestoreArrayF90 with "u".
>
> But if I call
> DMDAVecGetArrayF90 and
> DMDAVecRestoreArrayF90 with
> "u", "v" and "w", error starts
> to happen. I wonder why...
>
> Also, supposed I call:
>
> call
> DMDAVecGetArrayF90(da_u,u_local,u_array,ierr)
>
> call
> DMDAVecGetArrayF90(da_v,v_local,v_array,ierr)
>
> call
> DMDAVecGetArrayF90(da_w,w_local,w_array,ierr)
>
> u_array ....
>
> v_array .... etc
>
> Now to restore the array, does
> it matter the sequence they
> are restored?
> No it should not matter.
> If it matters that is a sign
> that memory has been written
> to incorrectly earlier in the
> code.
>
> Hi,
>
> Hmm, I have been getting
> different results on different
> intel compilers. I'm not sure
> if MPI played a part but I'm
> only using a single processor.
> In the debug mode, things run
> without problem. In optimized
> mode, in some cases, the code
> aborts even doing simple
> initialization:
>
>
> call
> DMDAVecGetArrayF90(da_u,u_local,u_array,ierr)
>
> call
> DMDAVecGetArrayF90(da_v,v_local,v_array,ierr)
>
> call
> DMDAVecGetArrayF90(da_w,w_local,w_array,ierr)
>
> call
> DMDAVecGetArrayF90(da_p,p_local,p_array,ierr)
>
> u_array = 0.d0
>
> v_array = 0.d0
>
> w_array = 0.d0
>
> p_array = 0.d0
>
>
> call
> DMDAVecRestoreArrayF90(da_p,p_local,p_array,ierr)
>
>
> call
> DMDAVecRestoreArrayF90(da_w,w_local,w_array,ierr)
>
> call
> DMDAVecRestoreArrayF90(da_v,v_local,v_array,ierr)
>
> call
> DMDAVecRestoreArrayF90(da_u,u_local,u_array,ierr)
>
> The code aborts at call
> DMDAVecRestoreArrayF90(da_w,w_local,w_array,ierr),
> giving segmentation error. But
> other version of intel
> compiler passes thru this part
> w/o error. Since the response
> is different among different
> compilers, is this PETSc or
> intel 's bug? Or mvapich or
> openmpi?
>
> We do this is a bunch of
> examples. Can you reproduce
> this different behavior in
> src/dm/examples/tutorials/ex11f90.F?
>
> Hi Matt,
>
> Do you mean putting the above
> lines into ex11f90.F and test?
>
> It already has DMDAVecGetArray().
> Just run it.
>
> Hi,
>
> It worked. The differences between
> mine and the code is the way the
> fortran modules are defined, and the
> ex11f90 only uses global vectors. Does
> it make a difference whether global or
> local vectors are used? Because the
> way it accesses x1 only touches the
> local region.
>
> No the global/local difference should
> not matter.
> Also, before using
> DMDAVecGetArrayF90, DMGetGlobalVector
> must be used 1st, is that so? I can't
> find the equivalent for local vector
> though.
>
> DMGetLocalVector()
>
> Ops, I do not have DMGetLocalVector and
> DMRestoreLocalVector in my code. Does it
> matter?
>
> If so, when should I call them?
>
> You just need a local vector from somewhere.
>
> Hi,
>
> Anyone can help with the questions below? Still
> trying to find why my code doesn't work.
>
> Thanks.
>
> Hi,
>
> I insert part of my error region code into
> ex11f90:
>
> call DMDAVecGetArrayF90(da_u,u_local,u_array,ierr)
> call
> DMDAVecGetArrayF90(da_v,v_local,v_array,ierr)
> call
> DMDAVecGetArrayF90(da_w,w_local,w_array,ierr)
> call
> DMDAVecGetArrayF90(da_p,p_local,p_array,ierr)
>
> u_array = 0.d0
> v_array = 0.d0
> w_array = 0.d0
> p_array = 0.d0
>
> call
> DMDAVecRestoreArrayF90(da_p,p_local,p_array,ierr)
>
> call
> DMDAVecRestoreArrayF90(da_w,w_local,w_array,ierr)
>
> call
> DMDAVecRestoreArrayF90(da_v,v_local,v_array,ierr)
>
> call
> DMDAVecRestoreArrayF90(da_u,u_local,u_array,ierr)
>
> It worked w/o error. I'm going to change the
> way the modules are defined in my code.
>
> My code contains a main program and a number
> of modules files, with subroutines inside e.g.
>
> module solve
> <- add include file?
> subroutine RRK
> <- add include file?
> end subroutine RRK
>
> end module solve
>
> So where should the include files (#include
> <finclude/petscdmda.h90>) be placed?
>
> After the module or inside the subroutine?
>
> Thanks.
>
> Matt
> Thanks.
>
> Matt
> Thanks.
>
> Matt
> Thanks
>
> Regards.
>
> Matt
> As in w, then v and u?
>
> call
> DMDAVecRestoreArrayF90(da_w,w_local,w_array,ierr)
> call
> DMDAVecRestoreArrayF90(da_v,v_local,v_array,ierr)
> call
> DMDAVecRestoreArrayF90(da_u,u_local,u_array,ierr)
>
> thanks
> Note also that the
> beginning and end indices of
> the u,v,w, are different for
> each process see for example
> http://www.mcs.anl.gov/petsc/petsc-3.4/src/dm/examples/tutorials/ex11f90.F
> (and they do not start at 1).
> This is how to get the loop
> bounds.
> Hi,
>
> In my case, I fixed the u,v,w
> such that their indices are
> the same. I also checked using
> DMDAGetCorners and
> DMDAGetGhostCorners. Now the
> problem lies in my subroutine
> treating it as a “plain old
> Fortran array”.
>
> If I declare them as pointers,
> their indices follow the C 0
> start convention, is that so?
> Not really. It is that in
> each process you need to
> access them from the indices
> indicated by DMDAGetCorners()
> for global vectors and
> DMDAGetGhostCorners() for
> local vectors. So really C or
> Fortran doesn’t make any
> difference.
>
>
> So my problem now is that in
> my old MPI code, the u(i,j,k)
> follow the Fortran 1 start
> convention. Is there some way
> to manipulate such that I do
> not have to change my u(i,j,k)
> to u(i-1,j-1,k-1)?
> If you code wishes to
> access them with indices plus
> one from the values returned
> by DMDAGetCorners() for global
> vectors and
> DMDAGetGhostCorners() for
> local vectors then you need to
> manually subtract off the 1.
>
> Barry
>
> Thanks.
> Barry
>
> On Apr 18, 2014, at 10:58 AM,
> TAY wee-beng <zonexo at gmail.com
> <mailto:zonexo at gmail.com>> wrote:
>
> Hi,
>
> I tried to pinpoint the
> problem. I reduced my job size
> and hence I can run on 1
> processor. Tried using
> valgrind but perhaps I'm using
> the optimized version, it
> didn't catch the error,
> besides saying "Segmentation
> fault (core dumped)"
>
> However, by re-writing my
> code, I found out a few things:
>
> 1. if I write my code this way:
>
> call
> DMDAVecGetArrayF90(da_u,u_local,u_array,ierr)
>
> call
> DMDAVecGetArrayF90(da_v,v_local,v_array,ierr)
>
> call
> DMDAVecGetArrayF90(da_w,w_local,w_array,ierr)
>
> u_array = ....
>
> v_array = ....
>
> w_array = ....
>
> call
> DMDAVecRestoreArrayF90(da_w,w_local,w_array,ierr)
>
> call
> DMDAVecRestoreArrayF90(da_v,v_local,v_array,ierr)
>
> call
> DMDAVecRestoreArrayF90(da_u,u_local,u_array,ierr)
>
> The code runs fine.
>
> 2. if I write my code this way:
>
> call
> DMDAVecGetArrayF90(da_u,u_local,u_array,ierr)
>
> call
> DMDAVecGetArrayF90(da_v,v_local,v_array,ierr)
>
> call
> DMDAVecGetArrayF90(da_w,w_local,w_array,ierr)
>
> call
> uvw_array_change(u_array,v_array,w_array)
> -> this subroutine does the
> same modification as the above.
>
> call
> DMDAVecRestoreArrayF90(da_w,w_local,w_array,ierr)
>
> call
> DMDAVecRestoreArrayF90(da_v,v_local,v_array,ierr)
>
> call
> DMDAVecRestoreArrayF90(da_u,u_local,u_array,ierr)
> -> error
>
> where the subroutine is:
>
> subroutine uvw_array_change(u,v,w)
>
> real(8), intent(inout) ::
> u(:,:,:),v(:,:,:),w(:,:,:)
>
> u ...
> v...
> w ...
>
> end subroutine uvw_array_change.
>
> The above will give an error at :
>
> call
> DMDAVecRestoreArrayF90(da_u,u_local,u_array,ierr)
>
> 3. Same as above, except I
> change the order of the last 3
> lines to:
>
> call
> DMDAVecRestoreArrayF90(da_u,u_local,u_array,ierr)
>
> call
> DMDAVecRestoreArrayF90(da_v,v_local,v_array,ierr)
>
> call
> DMDAVecRestoreArrayF90(da_w,w_local,w_array,ierr)
>
> So they are now in reversed
> order. Now it works.
>
> 4. Same as 2 or 3, except the
> subroutine is changed to :
>
> subroutine uvw_array_change(u,v,w)
>
> real(8), intent(inout) ::
> u(start_indices(1):end_indices(1),start_indices(2):end_indices(2),start_indices(3):end_indices(3))
>
> real(8), intent(inout) ::
> v(start_indices(1):end_indices(1),start_indices(2):end_indices(2),start_indices(3):end_indices(3))
>
> real(8), intent(inout) ::
> w(start_indices(1):end_indices(1),start_indices(2):end_indices(2),start_indices(3):end_indices(3))
>
> u ...
> v...
> w ...
>
> end subroutine uvw_array_change.
>
> The start_indices and
> end_indices are simply to
> shift the 0 indices of C
> convention to that of the 1
> indices of the Fortran
> convention. This is necessary
> in my case because most of my
> codes start array counting at
> 1, hence the "trick".
>
> However, now no matter which
> order of the
> DMDAVecRestoreArrayF90 (as in
> 2 or 3), error will occur at
> "call
> DMDAVecRestoreArrayF90(da_v,v_local,v_array,ierr)
> "
>
> So did I violate and cause
> memory corruption due to the
> trick above? But I can't think
> of any way other than
> the "trick" to continue using
> the 1 indices convention.
>
> Thank you.
>
> Yours sincerely,
>
> TAY wee-beng
>
> On 15/4/2014 8:00 PM, Barry
> Smith wrote:
> Try running under valgrind
> http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind
>
>
> On Apr 14, 2014, at 9:47 PM,
> TAY wee-beng <zonexo at gmail.com
> <mailto:zonexo at gmail.com>> wrote:
>
> Hi Barry,
>
> As I mentioned earlier, the
> code works fine in PETSc debug
> mode but fails in non-debug mode.
>
> I have attached my code.
>
> Thank you
>
> Yours sincerely,
>
> TAY wee-beng
>
> On 15/4/2014 2:26 AM, Barry
> Smith wrote:
> Please send the code that
> creates da_w and the
> declarations of w_array
>
> Barry
>
> On Apr 14, 2014, at 9:40 AM,
> TAY wee-beng
> <zonexo at gmail.com
> <mailto:zonexo at gmail.com>>
> wrote:
>
>
> Hi Barry,
>
> I'm not too sure how to do it.
> I'm running mpi. So I run:
>
> mpirun -n 4 ./a.out
> -start_in_debugger
>
> I got the msg below. Before
> the gdb windows appear (thru
> x11), the program aborts.
>
> Also I tried running in
> another cluster and it worked.
> Also tried in the current
> cluster in debug mode and it
> worked too.
>
> mpirun -n 4 ./a.out
> -start_in_debugger
> --------------------------------------------------------------------------
> An MPI process has executed an
> operation involving a call to the
> "fork()" system call to create
> a child process. Open MPI is
> currently
> operating in a condition that
> could result in memory
> corruption or
> other system errors; your MPI
> job may hang, crash, or
> produce silent
> data corruption. The use of
> fork() (or system() or other
> calls that
> create child processes) is
> strongly discouraged.
>
> The process that invoked fork was:
>
> Local host:
> n12-76 (PID 20235)
> MPI_COMM_WORLD rank: 2
>
> If you are *absolutely sure*
> that your application will
> successfully
> and correctly survive a call
> to fork(), you may disable
> this warning
> by setting the
> mpi_warn_on_fork MCA parameter
> to 0.
> --------------------------------------------------------------------------
> [2]PETSC ERROR: PETSC:
> Attaching gdb to ./a.out of
> pid 20235 on display
> localhost:50.0 on machine n12-76
> [0]PETSC ERROR: PETSC:
> Attaching gdb to ./a.out of
> pid 20233 on display
> localhost:50.0 on machine n12-76
> [1]PETSC ERROR: PETSC:
> Attaching gdb to ./a.out of
> pid 20234 on display
> localhost:50.0 on machine n12-76
> [3]PETSC ERROR: PETSC:
> Attaching gdb to ./a.out of
> pid 20236 on display
> localhost:50.0 on machine n12-76
> [n12-76:20232] 3 more
> processes have sent help
> message help-mpi-runtime.txt /
> mpi_init:warn-fork
> [n12-76:20232] Set MCA
> parameter
> "orte_base_help_aggregate" to
> 0 to see all help / error messages
>
> ....
>
> 1
> [1]PETSC ERROR:
> ------------------------------------------------------------------------
> [1]PETSC ERROR: Caught signal
> number 11 SEGV: Segmentation
> Violation, probably memory
> access out of range
> [1]PETSC ERROR: Try option
> -start_in_debugger or
> -on_error_attach_debugger
> [1]PETSC ERROR: or see
> http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind[1]PETSC
> ERROR: or try http://valgrind.org
> on GNU/linux and Apple Mac
> OS X to find memory corruption
> errors
> [1]PETSC ERROR: configure
> using --with-debugging=yes,
> recompile, link, and run
> [1]PETSC ERROR: to get more
> information on the crash.
> [1]PETSC ERROR: User provided
> function() line 0 in unknown
> directory unknown file (null)
> [3]PETSC ERROR:
> ------------------------------------------------------------------------
> [3]PETSC ERROR: Caught signal
> number 11 SEGV: Segmentation
> Violation, probably memory
> access out of range
> [3]PETSC ERROR: Try option
> -start_in_debugger or
> -on_error_attach_debugger
> [3]PETSC ERROR: or see
> http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind[3]PETSC
> ERROR: or try http://valgrind.org
> on GNU/linux and Apple Mac
> OS X to find memory corruption
> errors
> [3]PETSC ERROR: configure
> using --with-debugging=yes,
> recompile, link, and run
> [3]PETSC ERROR: to get more
> information on the crash.
> [3]PETSC ERROR: User provided
> function() line 0 in unknown
> directory unknown file (null)
>
> ...
> Thank you.
>
> Yours sincerely,
>
> TAY wee-beng
>
> On 14/4/2014 9:05 PM, Barry
> Smith wrote:
>
> Because IO doesn’t always
> get flushed immediately it may
> not be hanging at this point.
> It is better to use the
> option -start_in_debugger then
> type cont in each debugger
> window and then when you think
> it is “hanging” do a control C
> in each debugger window and
> type where to see where each
> process is you can also look
> around in the debugger at
> variables to see why it is
> “hanging” at that point.
>
> Barry
>
> This routines don’t have
> any parallel communication in
> them so are unlikely to hang.
>
> On Apr 14, 2014, at 6:52 AM,
> TAY wee-beng
>
> <zonexo at gmail.com
> <mailto:zonexo at gmail.com>>
>
> wrote:
>
>
>
> Hi,
>
> My code hangs and I added in
> mpi_barrier and print to catch
> the bug. I found that it hangs
> after printing "7". Is it
> because I'm doing something
> wrong? I need to access the
> u,v,w array so I use
> DMDAVecGetArrayF90. After
> access, I use
> DMDAVecRestoreArrayF90.
>
> call
> DMDAVecGetArrayF90(da_u,u_local,u_array,ierr)
> call
> MPI_Barrier(MPI_COMM_WORLD,ierr);
> if (myid==0) print *,"3"
> call
> DMDAVecGetArrayF90(da_v,v_local,v_array,ierr)
> call
> MPI_Barrier(MPI_COMM_WORLD,ierr);
> if (myid==0) print *,"4"
> call
> DMDAVecGetArrayF90(da_w,w_local,w_array,ierr)
> call
> MPI_Barrier(MPI_COMM_WORLD,ierr);
> if (myid==0) print *,"5"
> call
> I_IIB_uv_initial_1st_dm(I_cell_no_u1,I_cell_no_v1,I_cell_no_w1,I_cell_u1,I_cell_v1,I_cell_w1,u_array,v_array,w_array)
> call
> MPI_Barrier(MPI_COMM_WORLD,ierr);
> if (myid==0) print *,"6"
> call
> DMDAVecRestoreArrayF90(da_w,w_local,w_array,ierr)
> !must be in reverse order
> call
> MPI_Barrier(MPI_COMM_WORLD,ierr);
> if (myid==0) print *,"7"
> call
> DMDAVecRestoreArrayF90(da_v,v_local,v_array,ierr)
> call
> MPI_Barrier(MPI_COMM_WORLD,ierr);
> if (myid==0) print *,"8"
> call
> DMDAVecRestoreArrayF90(da_u,u_local,u_array,ierr)
> --
> Thank you.
>
> Yours sincerely,
>
> TAY wee-beng
>
>
>
> <code.txt>
>
>
>
>
> --
> What most experimenters take
> for granted before they begin
> their experiments is
> infinitely more interesting
> than any results to which
> their experiments lead.
> -- Norbert Wiener
>
>
>
> --
> What most experimenters take for
> granted before they begin their
> experiments is infinitely more
> interesting than any results to
> which their experiments lead.
> -- Norbert Wiener
>
>
>
> --
> What most experimenters take for
> granted before they begin their
> experiments is infinitely more
> interesting than any results to which
> their experiments lead.
> -- Norbert Wiener
>
>
>
> --
> What most experimenters take for granted
> before they begin their experiments is
> infinitely more interesting than any
> results to which their experiments lead.
> -- Norbert Wiener
>
>
>
>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which
> their experiments lead.
> -- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20140519/fc209428/attachment-0001.html>
More information about the petsc-users
mailing list