[petsc-users] Problem with DMDAVecGetArrayF90 and DMDAVecRestoreArrayF90

TAY wee-beng zonexo at gmail.com
Sat Apr 19 10:16:28 CDT 2014


On 19/4/2014 10:55 PM, Matthew Knepley wrote:
> On Sat, Apr 19, 2014 at 9:14 AM, TAY wee-beng <zonexo at gmail.com 
> <mailto:zonexo at gmail.com>> wrote:
>
>     On 19/4/2014 6:48 PM, Matthew Knepley wrote:
>>     On Sat, Apr 19, 2014 at 4:59 AM, TAY wee-beng <zonexo at gmail.com
>>     <mailto:zonexo at gmail.com>> wrote:
>>
>>         On 19/4/2014 1:17 PM, Barry Smith wrote:
>>
>>             On Apr 19, 2014, at 12:11 AM, TAY wee-beng
>>             <zonexo at gmail.com <mailto:zonexo at gmail.com>> wrote:
>>
>>                 On 19/4/2014 12:10 PM, Barry Smith wrote:
>>
>>                     On Apr 18, 2014, at 9:57 PM, TAY wee-beng
>>                     <zonexo at gmail.com <mailto:zonexo at gmail.com>> wrote:
>>
>>                         On 19/4/2014 3:53 AM, Barry Smith wrote:
>>
>>                                Hmm,
>>
>>                                    Interface DMDAVecGetArrayF90
>>                                      Subroutine
>>                             DMDAVecGetArrayF903(da1, v,d1,ierr)
>>                                        USE_DM_HIDE
>>                                        DM_HIDE da1
>>                                        VEC_HIDE v
>>                                        PetscScalar,pointer :: d1(:,:,:)
>>                                        PetscErrorCode ierr
>>                                      End Subroutine
>>
>>                                 So the d1 is a F90 POINTER. But your
>>                             subroutine seems to be treating it as a
>>                             “plain old Fortran array”?
>>                             real(8), intent(inout) ::
>>                             u(:,:,:),v(:,:,:),w(:,:,:)
>>
>>                 Hi,
>>
>>                 So d1 is a pointer, and it's different if I declare
>>                 it as "plain old Fortran array"? Because I declare it
>>                 as a Fortran array and it works w/o any problem if I
>>                 only call DMDAVecGetArrayF90 and
>>                 DMDAVecRestoreArrayF90 with "u".
>>
>>                 But if I call DMDAVecGetArrayF90 and
>>                 DMDAVecRestoreArrayF90 with "u", "v" and "w", error
>>                 starts to happen. I wonder why...
>>
>>                 Also, supposed I call:
>>
>>                 call DMDAVecGetArrayF90(da_u,u_local,u_array,ierr)
>>
>>                     call DMDAVecGetArrayF90(da_v,v_local,v_array,ierr)
>>
>>                     call DMDAVecGetArrayF90(da_w,w_local,w_array,ierr)
>>
>>                 u_array ....
>>
>>                 v_array .... etc
>>
>>                 Now to restore the array, does it matter the sequence
>>                 they are restored?
>>
>>                 No it should not matter. If it matters that is a sign
>>             that memory has been written to incorrectly earlier in
>>             the code.
>>
>>         Hi,
>>
>>         Hmm, I have been getting different results on different intel
>>         compilers. I'm not sure if MPI played a part but I'm only
>>         using a single processor. In the debug mode, things run
>>         without problem. In optimized mode, in some cases, the code
>>         aborts even doing simple initialization:
>>
>>
>>         call DMDAVecGetArrayF90(da_u,u_local,u_array,ierr)
>>
>>             call DMDAVecGetArrayF90(da_v,v_local,v_array,ierr)
>>
>>             call DMDAVecGetArrayF90(da_w,w_local,w_array,ierr)
>>
>>             call DMDAVecGetArrayF90(da_p,p_local,p_array,ierr)
>>
>>             u_array = 0.d0
>>
>>             v_array = 0.d0
>>
>>             w_array = 0.d0
>>
>>             p_array = 0.d0
>>
>>
>>             call DMDAVecRestoreArrayF90(da_p,p_local,p_array,ierr)
>>
>>
>>             call DMDAVecRestoreArrayF90(da_w,w_local,w_array,ierr)
>>
>>             call DMDAVecRestoreArrayF90(da_v,v_local,v_array,ierr)
>>
>>             call DMDAVecRestoreArrayF90(da_u,u_local,u_array,ierr)
>>
>>         The code aborts at call
>>         DMDAVecRestoreArrayF90(da_w,w_local,w_array,ierr), giving
>>         segmentation error. But other version of intel compiler
>>         passes thru this part w/o error. Since the response is
>>         different among different compilers, is this PETSc or intel
>>         's bug? Or mvapich or openmpi?
>>
>>
>>     We do this is a bunch of examples. Can you reproduce this
>>     different behavior in src/dm/examples/tutorials/ex11f90.F?
>
>     Hi Matt,
>
>     Do you mean putting the above lines into ex11f90.F and test?
>
>
> It already has DMDAVecGetArray(). Just run it.

Hi,

It worked. The differences between mine and the code is the way the 
fortran modules are defined, and the ex11f90 only uses global vectors. 
Does it make a difference whether global or local vectors are used? 
Because the way it accesses x1 only touches the local region.

Also, before using DMDAVecGetArrayF90, DMGetGlobalVector must be used 
1st, is that so? I can't find the equivalent for local vector though.

Thanks.
>
>    Matt
>
>     Thanks
>
>     Regards.
>>
>>        Matt
>>
>>                 As in w, then v and u?
>>
>>                 call DMDAVecRestoreArrayF90(da_w,w_local,w_array,ierr)
>>                 call DMDAVecRestoreArrayF90(da_v,v_local,v_array,ierr)
>>                 call DMDAVecRestoreArrayF90(da_u,u_local,u_array,ierr)
>>
>>                 thanks
>>
>>                             Note also that the beginning and end
>>                             indices of the u,v,w, are different for
>>                             each process see for example
>>                             http://www.mcs.anl.gov/petsc/petsc-3.4/src/dm/examples/tutorials/ex11f90.F
>>                              (and they do not start at 1). This is
>>                             how to get the loop bounds.
>>
>>                         Hi,
>>
>>                         In my case, I fixed the u,v,w such that their
>>                         indices are the same. I also checked using
>>                         DMDAGetCorners and DMDAGetGhostCorners. Now
>>                         the problem lies in my subroutine treating it
>>                         as a “plain old Fortran array”.
>>
>>                         If I declare them as pointers, their indices
>>                         follow the C 0 start convention, is that so?
>>
>>                         Not really. It is that in each process you
>>                     need to access them from the indices indicated by
>>                     DMDAGetCorners() for global vectors and
>>                     DMDAGetGhostCorners() for local vectors.  So
>>                     really C or Fortran doesn’t make any difference.
>>
>>
>>                         So my problem now is that in my old MPI code,
>>                         the u(i,j,k) follow the Fortran 1 start
>>                         convention. Is there some way to manipulate
>>                         such that I do not have to change my u(i,j,k)
>>                         to u(i-1,j-1,k-1)?
>>
>>                        If you code wishes to access them with indices
>>                     plus one from the values returned by
>>                     DMDAGetCorners() for global vectors and
>>                     DMDAGetGhostCorners() for local vectors then you
>>                     need to manually subtract off the 1.
>>
>>                        Barry
>>
>>                         Thanks.
>>
>>                              Barry
>>
>>                             On Apr 18, 2014, at 10:58 AM, TAY
>>                             wee-beng <zonexo at gmail.com
>>                             <mailto:zonexo at gmail.com>> wrote:
>>
>>                                 Hi,
>>
>>                                 I tried to pinpoint the problem. I
>>                                 reduced my job size and hence I can
>>                                 run on 1 processor. Tried using
>>                                 valgrind but perhaps I'm using the
>>                                 optimized version, it didn't catch
>>                                 the error, besides saying
>>                                 "Segmentation fault (core dumped)"
>>
>>                                 However, by re-writing my code, I
>>                                 found out a few things:
>>
>>                                 1. if I write my code this way:
>>
>>                                 call
>>                                 DMDAVecGetArrayF90(da_u,u_local,u_array,ierr)
>>
>>                                 call
>>                                 DMDAVecGetArrayF90(da_v,v_local,v_array,ierr)
>>
>>                                 call
>>                                 DMDAVecGetArrayF90(da_w,w_local,w_array,ierr)
>>
>>                                 u_array = ....
>>
>>                                 v_array = ....
>>
>>                                 w_array = ....
>>
>>                                 call
>>                                 DMDAVecRestoreArrayF90(da_w,w_local,w_array,ierr)
>>
>>                                 call
>>                                 DMDAVecRestoreArrayF90(da_v,v_local,v_array,ierr)
>>
>>                                 call
>>                                 DMDAVecRestoreArrayF90(da_u,u_local,u_array,ierr)
>>
>>                                 The code runs fine.
>>
>>                                 2. if I write my code this way:
>>
>>                                 call
>>                                 DMDAVecGetArrayF90(da_u,u_local,u_array,ierr)
>>
>>                                 call
>>                                 DMDAVecGetArrayF90(da_v,v_local,v_array,ierr)
>>
>>                                 call
>>                                 DMDAVecGetArrayF90(da_w,w_local,w_array,ierr)
>>
>>                                 call
>>                                 uvw_array_change(u_array,v_array,w_array)
>>                                 -> this subroutine does the same
>>                                 modification as the above.
>>
>>                                 call
>>                                 DMDAVecRestoreArrayF90(da_w,w_local,w_array,ierr)
>>
>>                                 call
>>                                 DMDAVecRestoreArrayF90(da_v,v_local,v_array,ierr)
>>
>>                                 call
>>                                 DMDAVecRestoreArrayF90(da_u,u_local,u_array,ierr)
>>                                 -> error
>>
>>                                 where the subroutine is:
>>
>>                                 subroutine uvw_array_change(u,v,w)
>>
>>                                 real(8), intent(inout) ::
>>                                 u(:,:,:),v(:,:,:),w(:,:,:)
>>
>>                                 u ...
>>                                 v...
>>                                 w ...
>>
>>                                 end subroutine uvw_array_change.
>>
>>                                 The above will give an error at :
>>
>>                                 call
>>                                 DMDAVecRestoreArrayF90(da_u,u_local,u_array,ierr)
>>
>>                                 3. Same as above, except I change the
>>                                 order of the last 3 lines to:
>>
>>                                 call
>>                                 DMDAVecRestoreArrayF90(da_u,u_local,u_array,ierr)
>>
>>                                 call
>>                                 DMDAVecRestoreArrayF90(da_v,v_local,v_array,ierr)
>>
>>                                 call
>>                                 DMDAVecRestoreArrayF90(da_w,w_local,w_array,ierr)
>>
>>                                 So they are now in reversed order.
>>                                 Now it works.
>>
>>                                 4. Same as 2 or 3, except the
>>                                 subroutine is changed to :
>>
>>                                 subroutine uvw_array_change(u,v,w)
>>
>>                                 real(8), intent(inout) ::
>>                                 u(start_indices(1):end_indices(1),start_indices(2):end_indices(2),start_indices(3):end_indices(3))
>>
>>                                 real(8), intent(inout) ::
>>                                 v(start_indices(1):end_indices(1),start_indices(2):end_indices(2),start_indices(3):end_indices(3))
>>
>>                                 real(8), intent(inout) ::
>>                                 w(start_indices(1):end_indices(1),start_indices(2):end_indices(2),start_indices(3):end_indices(3))
>>
>>                                 u ...
>>                                 v...
>>                                 w ...
>>
>>                                 end subroutine uvw_array_change.
>>
>>                                 The start_indices and end_indices are
>>                                 simply to shift the 0 indices of C
>>                                 convention to that of the 1 indices
>>                                 of the Fortran convention. This is
>>                                 necessary in my case because most of
>>                                 my codes start array counting at 1,
>>                                 hence the "trick".
>>
>>                                 However, now no matter which order of
>>                                 the DMDAVecRestoreArrayF90 (as in 2
>>                                 or 3), error will occur at "call
>>                                 DMDAVecRestoreArrayF90(da_v,v_local,v_array,ierr)
>>                                 "
>>
>>                                 So did I violate and cause memory
>>                                 corruption due to the trick above?
>>                                 But I can't think of any way other
>>                                 than the "trick" to continue using
>>                                 the 1 indices convention.
>>
>>                                 Thank you.
>>
>>                                 Yours sincerely,
>>
>>                                 TAY wee-beng
>>
>>                                 On 15/4/2014 8:00 PM, Barry Smith wrote:
>>
>>                                      Try running under valgrind
>>                                     http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind
>>
>>
>>                                     On Apr 14, 2014, at 9:47 PM, TAY
>>                                     wee-beng <zonexo at gmail.com
>>                                     <mailto:zonexo at gmail.com>> wrote:
>>
>>                                         Hi Barry,
>>
>>                                         As I mentioned earlier, the
>>                                         code works fine in PETSc
>>                                         debug mode but fails in
>>                                         non-debug mode.
>>
>>                                         I have attached my code.
>>
>>                                         Thank you
>>
>>                                         Yours sincerely,
>>
>>                                         TAY wee-beng
>>
>>                                         On 15/4/2014 2:26 AM, Barry
>>                                         Smith wrote:
>>
>>                                                Please send the code
>>                                             that creates da_w and the
>>                                             declarations of w_array
>>
>>                                                Barry
>>
>>                                             On Apr 14, 2014, at 9:40
>>                                             AM, TAY wee-beng
>>                                             <zonexo at gmail.com
>>                                             <mailto:zonexo at gmail.com>>
>>                                               wrote:
>>
>>
>>                                                 Hi Barry,
>>
>>                                                 I'm not too sure how
>>                                                 to do it. I'm running
>>                                                 mpi. So I run:
>>
>>                                                   mpirun -n 4 ./a.out
>>                                                 -start_in_debugger
>>
>>                                                 I got the msg below.
>>                                                 Before the gdb
>>                                                 windows appear (thru
>>                                                 x11), the program aborts.
>>
>>                                                 Also I tried running
>>                                                 in another cluster
>>                                                 and it worked. Also
>>                                                 tried in the current
>>                                                 cluster in debug mode
>>                                                 and it worked too.
>>
>>                                                 mpirun -n 4 ./a.out
>>                                                 -start_in_debugger
>>                                                 --------------------------------------------------------------------------
>>                                                 An MPI process has
>>                                                 executed an operation
>>                                                 involving a call to the
>>                                                 "fork()" system call
>>                                                 to create a child
>>                                                 process.  Open MPI is
>>                                                 currently
>>                                                 operating in a
>>                                                 condition that could
>>                                                 result in memory
>>                                                 corruption or
>>                                                 other system errors;
>>                                                 your MPI job may
>>                                                 hang, crash, or
>>                                                 produce silent
>>                                                 data corruption.  The
>>                                                 use of fork() (or
>>                                                 system() or other
>>                                                 calls that
>>                                                 create child
>>                                                 processes) is
>>                                                 strongly discouraged.
>>
>>                                                 The process that
>>                                                 invoked fork was:
>>
>>                                                    Local host:
>>                                                  n12-76 (PID 20235)
>>                                                    MPI_COMM_WORLD rank: 2
>>
>>                                                 If you are
>>                                                 *absolutely sure*
>>                                                 that your application
>>                                                 will successfully
>>                                                 and correctly survive
>>                                                 a call to fork(), you
>>                                                 may disable this warning
>>                                                 by setting the
>>                                                 mpi_warn_on_fork MCA
>>                                                 parameter to 0.
>>                                                 --------------------------------------------------------------------------
>>                                                 [2]PETSC ERROR:
>>                                                 PETSC: Attaching gdb
>>                                                 to ./a.out of pid
>>                                                 20235 on display
>>                                                 localhost:50.0 on
>>                                                 machine n12-76
>>                                                 [0]PETSC ERROR:
>>                                                 PETSC: Attaching gdb
>>                                                 to ./a.out of pid
>>                                                 20233 on display
>>                                                 localhost:50.0 on
>>                                                 machine n12-76
>>                                                 [1]PETSC ERROR:
>>                                                 PETSC: Attaching gdb
>>                                                 to ./a.out of pid
>>                                                 20234 on display
>>                                                 localhost:50.0 on
>>                                                 machine n12-76
>>                                                 [3]PETSC ERROR:
>>                                                 PETSC: Attaching gdb
>>                                                 to ./a.out of pid
>>                                                 20236 on display
>>                                                 localhost:50.0 on
>>                                                 machine n12-76
>>                                                 [n12-76:20232] 3 more
>>                                                 processes have sent
>>                                                 help message
>>                                                 help-mpi-runtime.txt
>>                                                 / mpi_init:warn-fork
>>                                                 [n12-76:20232] Set
>>                                                 MCA parameter
>>                                                 "orte_base_help_aggregate"
>>                                                 to 0 to see all help
>>                                                 / error messages
>>
>>                                                 ....
>>
>>                                                   1
>>                                                 [1]PETSC ERROR:
>>                                                 ------------------------------------------------------------------------
>>                                                 [1]PETSC ERROR:
>>                                                 Caught signal number
>>                                                 11 SEGV: Segmentation
>>                                                 Violation, probably
>>                                                 memory access out of
>>                                                 range
>>                                                 [1]PETSC ERROR: Try
>>                                                 option
>>                                                 -start_in_debugger or
>>                                                 -on_error_attach_debugger
>>                                                 [1]PETSC ERROR: or see
>>                                                 http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind[1]PETSC
>>                                                 ERROR: or try
>>                                                 http://valgrind.org
>>                                                   on GNU/linux and
>>                                                 Apple Mac OS X to
>>                                                 find memory
>>                                                 corruption errors
>>                                                 [1]PETSC ERROR:
>>                                                 configure using
>>                                                 --with-debugging=yes,
>>                                                 recompile, link, and run
>>                                                 [1]PETSC ERROR: to
>>                                                 get more information
>>                                                 on the crash.
>>                                                 [1]PETSC ERROR: User
>>                                                 provided function()
>>                                                 line 0 in unknown
>>                                                 directory unknown
>>                                                 file (null)
>>                                                 [3]PETSC ERROR:
>>                                                 ------------------------------------------------------------------------
>>                                                 [3]PETSC ERROR:
>>                                                 Caught signal number
>>                                                 11 SEGV: Segmentation
>>                                                 Violation, probably
>>                                                 memory access out of
>>                                                 range
>>                                                 [3]PETSC ERROR: Try
>>                                                 option
>>                                                 -start_in_debugger or
>>                                                 -on_error_attach_debugger
>>                                                 [3]PETSC ERROR: or see
>>                                                 http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind[3]PETSC
>>                                                 ERROR: or try
>>                                                 http://valgrind.org
>>                                                   on GNU/linux and
>>                                                 Apple Mac OS X to
>>                                                 find memory
>>                                                 corruption errors
>>                                                 [3]PETSC ERROR:
>>                                                 configure using
>>                                                 --with-debugging=yes,
>>                                                 recompile, link, and run
>>                                                 [3]PETSC ERROR: to
>>                                                 get more information
>>                                                 on the crash.
>>                                                 [3]PETSC ERROR: User
>>                                                 provided function()
>>                                                 line 0 in unknown
>>                                                 directory unknown
>>                                                 file (null)
>>
>>                                                 ...
>>                                                 Thank you.
>>
>>                                                 Yours sincerely,
>>
>>                                                 TAY wee-beng
>>
>>                                                 On 14/4/2014 9:05 PM,
>>                                                 Barry Smith wrote:
>>
>>                                                        Because IO
>>                                                     doesn’t always
>>                                                     get flushed
>>                                                     immediately it
>>                                                     may not be
>>                                                     hanging at this
>>                                                     point.  It is
>>                                                     better to use the
>>                                                     option
>>                                                     -start_in_debugger then
>>                                                     type cont in each
>>                                                     debugger window
>>                                                     and then when you
>>                                                     think it is
>>                                                     “hanging” do a
>>                                                     control C in each
>>                                                     debugger window
>>                                                     and type where to
>>                                                     see where each
>>                                                     process is you
>>                                                     can also look
>>                                                     around in the
>>                                                     debugger at
>>                                                     variables to see
>>                                                     why it is
>>                                                     “hanging” at that
>>                                                     point.
>>
>>                                                         Barry
>>
>>                                                        This routines
>>                                                     don’t have any
>>                                                     parallel
>>                                                     communication in
>>                                                     them so are
>>                                                     unlikely to hang.
>>
>>                                                     On Apr 14, 2014,
>>                                                     at 6:52 AM, TAY
>>                                                     wee-beng
>>
>>                                                     <zonexo at gmail.com
>>                                                     <mailto:zonexo at gmail.com>>
>>
>>                                                       wrote:
>>
>>
>>
>>                                                         Hi,
>>
>>                                                         My code hangs
>>                                                         and I added
>>                                                         in
>>                                                         mpi_barrier
>>                                                         and print to
>>                                                         catch the
>>                                                         bug. I found
>>                                                         that it hangs
>>                                                         after
>>                                                         printing "7".
>>                                                         Is it because
>>                                                         I'm doing
>>                                                         something
>>                                                         wrong? I need
>>                                                         to access the
>>                                                         u,v,w array
>>                                                         so I use
>>                                                         DMDAVecGetArrayF90.
>>                                                         After access,
>>                                                         I use
>>                                                         DMDAVecRestoreArrayF90.
>>
>>                                                                  call
>>                                                         DMDAVecGetArrayF90(da_u,u_local,u_array,ierr)
>>                                                                  call
>>                                                         MPI_Barrier(MPI_COMM_WORLD,ierr);
>>                                                          if (myid==0)
>>                                                         print *,"3"
>>                                                                  call
>>                                                         DMDAVecGetArrayF90(da_v,v_local,v_array,ierr)
>>                                                                  call
>>                                                         MPI_Barrier(MPI_COMM_WORLD,ierr);
>>                                                          if (myid==0)
>>                                                         print *,"4"
>>                                                                  call
>>                                                         DMDAVecGetArrayF90(da_w,w_local,w_array,ierr)
>>                                                                  call
>>                                                         MPI_Barrier(MPI_COMM_WORLD,ierr);
>>                                                          if (myid==0)
>>                                                         print *,"5"
>>                                                                  call
>>                                                         I_IIB_uv_initial_1st_dm(I_cell_no_u1,I_cell_no_v1,I_cell_no_w1,I_cell_u1,I_cell_v1,I_cell_w1,u_array,v_array,w_array)
>>                                                                  call
>>                                                         MPI_Barrier(MPI_COMM_WORLD,ierr);
>>                                                          if (myid==0)
>>                                                         print *,"6"
>>                                                                  call
>>                                                         DMDAVecRestoreArrayF90(da_w,w_local,w_array,ierr)
>>                                                          !must be in
>>                                                         reverse order
>>                                                                  call
>>                                                         MPI_Barrier(MPI_COMM_WORLD,ierr);
>>                                                          if (myid==0)
>>                                                         print *,"7"
>>                                                                  call
>>                                                         DMDAVecRestoreArrayF90(da_v,v_local,v_array,ierr)
>>                                                                  call
>>                                                         MPI_Barrier(MPI_COMM_WORLD,ierr);
>>                                                          if (myid==0)
>>                                                         print *,"8"
>>                                                                  call
>>                                                         DMDAVecRestoreArrayF90(da_u,u_local,u_array,ierr)
>>                                                         -- 
>>                                                         Thank you.
>>
>>                                                         Yours sincerely,
>>
>>                                                         TAY wee-beng
>>
>>
>>
>>                                         <code.txt>
>>
>>
>>
>>
>>
>>     -- 
>>     What most experimenters take for granted before they begin their
>>     experiments is infinitely more interesting than any results to
>>     which their experiments lead.
>>     -- Norbert Wiener
>
>
>
>
> -- 
> What most experimenters take for granted before they begin their 
> experiments is infinitely more interesting than any results to which 
> their experiments lead.
> -- Norbert Wiener

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20140419/f8089fa6/attachment-0001.html>


More information about the petsc-users mailing list