[petsc-users] Problem with DMDAVecGetArrayF90 and DMDAVecRestoreArrayF90
Barry Smith
bsmith at mcs.anl.gov
Fri Apr 18 14:53:34 CDT 2014
Hmm,
Interface DMDAVecGetArrayF90
Subroutine DMDAVecGetArrayF903(da1, v,d1,ierr)
USE_DM_HIDE
DM_HIDE da1
VEC_HIDE v
PetscScalar,pointer :: d1(:,:,:)
PetscErrorCode ierr
End Subroutine
So the d1 is a F90 POINTER. But your subroutine seems to be treating it as a “plain old Fortran array”?
real(8), intent(inout) :: u(:,:,:),v(:,:,:),w(:,:,:)
Note also that the beginning and end indices of the u,v,w, are different for each process see for example http://www.mcs.anl.gov/petsc/petsc-3.4/src/dm/examples/tutorials/ex11f90.F (and they do not start at 1). This is how to get the loop bounds.
Barry
On Apr 18, 2014, at 10:58 AM, TAY wee-beng <zonexo at gmail.com> wrote:
> Hi,
>
> I tried to pinpoint the problem. I reduced my job size and hence I can run on 1 processor. Tried using valgrind but perhaps I'm using the optimized version, it didn't catch the error, besides saying "Segmentation fault (core dumped)"
>
> However, by re-writing my code, I found out a few things:
>
> 1. if I write my code this way:
>
> call DMDAVecGetArrayF90(da_u,u_local,u_array,ierr)
>
> call DMDAVecGetArrayF90(da_v,v_local,v_array,ierr)
>
> call DMDAVecGetArrayF90(da_w,w_local,w_array,ierr)
>
> u_array = ....
>
> v_array = ....
>
> w_array = ....
>
> call DMDAVecRestoreArrayF90(da_w,w_local,w_array,ierr)
>
> call DMDAVecRestoreArrayF90(da_v,v_local,v_array,ierr)
>
> call DMDAVecRestoreArrayF90(da_u,u_local,u_array,ierr)
>
> The code runs fine.
>
> 2. if I write my code this way:
>
> call DMDAVecGetArrayF90(da_u,u_local,u_array,ierr)
>
> call DMDAVecGetArrayF90(da_v,v_local,v_array,ierr)
>
> call DMDAVecGetArrayF90(da_w,w_local,w_array,ierr)
>
> call uvw_array_change(u_array,v_array,w_array) -> this subroutine does the same modification as the above.
>
> call DMDAVecRestoreArrayF90(da_w,w_local,w_array,ierr)
>
> call DMDAVecRestoreArrayF90(da_v,v_local,v_array,ierr)
>
> call DMDAVecRestoreArrayF90(da_u,u_local,u_array,ierr) -> error
>
> where the subroutine is:
>
> subroutine uvw_array_change(u,v,w)
>
> real(8), intent(inout) :: u(:,:,:),v(:,:,:),w(:,:,:)
>
> u ...
> v...
> w ...
>
> end subroutine uvw_array_change.
>
> The above will give an error at :
>
> call DMDAVecRestoreArrayF90(da_u,u_local,u_array,ierr)
>
> 3. Same as above, except I change the order of the last 3 lines to:
>
> call DMDAVecRestoreArrayF90(da_u,u_local,u_array,ierr)
>
> call DMDAVecRestoreArrayF90(da_v,v_local,v_array,ierr)
>
> call DMDAVecRestoreArrayF90(da_w,w_local,w_array,ierr)
>
> So they are now in reversed order. Now it works.
>
> 4. Same as 2 or 3, except the subroutine is changed to :
>
> subroutine uvw_array_change(u,v,w)
>
> real(8), intent(inout) :: u(start_indices(1):end_indices(1),start_indices(2):end_indices(2),start_indices(3):end_indices(3))
>
> real(8), intent(inout) :: v(start_indices(1):end_indices(1),start_indices(2):end_indices(2),start_indices(3):end_indices(3))
>
> real(8), intent(inout) :: w(start_indices(1):end_indices(1),start_indices(2):end_indices(2),start_indices(3):end_indices(3))
>
> u ...
> v...
> w ...
>
> end subroutine uvw_array_change.
>
> The start_indices and end_indices are simply to shift the 0 indices of C convention to that of the 1 indices of the Fortran convention. This is necessary in my case because most of my codes start array counting at 1, hence the "trick".
>
> However, now no matter which order of the DMDAVecRestoreArrayF90 (as in 2 or 3), error will occur at "call DMDAVecRestoreArrayF90(da_v,v_local,v_array,ierr) "
>
> So did I violate and cause memory corruption due to the trick above? But I can't think of any way other than the "trick" to continue using the 1 indices convention.
>
> Thank you.
>
> Yours sincerely,
>
> TAY wee-beng
>
> On 15/4/2014 8:00 PM, Barry Smith wrote:
>> Try running under valgrind http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind
>>
>>
>> On Apr 14, 2014, at 9:47 PM, TAY wee-beng <zonexo at gmail.com> wrote:
>>
>>> Hi Barry,
>>>
>>> As I mentioned earlier, the code works fine in PETSc debug mode but fails in non-debug mode.
>>>
>>> I have attached my code.
>>>
>>> Thank you
>>>
>>> Yours sincerely,
>>>
>>> TAY wee-beng
>>>
>>> On 15/4/2014 2:26 AM, Barry Smith wrote:
>>>> Please send the code that creates da_w and the declarations of w_array
>>>>
>>>> Barry
>>>>
>>>> On Apr 14, 2014, at 9:40 AM, TAY wee-beng
>>>> <zonexo at gmail.com>
>>>> wrote:
>>>>
>>>>
>>>>> Hi Barry,
>>>>>
>>>>> I'm not too sure how to do it. I'm running mpi. So I run:
>>>>>
>>>>> mpirun -n 4 ./a.out -start_in_debugger
>>>>>
>>>>> I got the msg below. Before the gdb windows appear (thru x11), the program aborts.
>>>>>
>>>>> Also I tried running in another cluster and it worked. Also tried in the current cluster in debug mode and it worked too.
>>>>>
>>>>> mpirun -n 4 ./a.out -start_in_debugger
>>>>> --------------------------------------------------------------------------
>>>>> An MPI process has executed an operation involving a call to the
>>>>> "fork()" system call to create a child process. Open MPI is currently
>>>>> operating in a condition that could result in memory corruption or
>>>>> other system errors; your MPI job may hang, crash, or produce silent
>>>>> data corruption. The use of fork() (or system() or other calls that
>>>>> create child processes) is strongly discouraged.
>>>>>
>>>>> The process that invoked fork was:
>>>>>
>>>>> Local host: n12-76 (PID 20235)
>>>>> MPI_COMM_WORLD rank: 2
>>>>>
>>>>> If you are *absolutely sure* that your application will successfully
>>>>> and correctly survive a call to fork(), you may disable this warning
>>>>> by setting the mpi_warn_on_fork MCA parameter to 0.
>>>>> --------------------------------------------------------------------------
>>>>> [2]PETSC ERROR: PETSC: Attaching gdb to ./a.out of pid 20235 on display localhost:50.0 on machine n12-76
>>>>> [0]PETSC ERROR: PETSC: Attaching gdb to ./a.out of pid 20233 on display localhost:50.0 on machine n12-76
>>>>> [1]PETSC ERROR: PETSC: Attaching gdb to ./a.out of pid 20234 on display localhost:50.0 on machine n12-76
>>>>> [3]PETSC ERROR: PETSC: Attaching gdb to ./a.out of pid 20236 on display localhost:50.0 on machine n12-76
>>>>> [n12-76:20232] 3 more processes have sent help message help-mpi-runtime.txt / mpi_init:warn-fork
>>>>> [n12-76:20232] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
>>>>>
>>>>> ....
>>>>>
>>>>> 1
>>>>> [1]PETSC ERROR: ------------------------------------------------------------------------
>>>>> [1]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range
>>>>> [1]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
>>>>> [1]PETSC ERROR: or see
>>>>> http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind[1]PETSC ERROR: or try http://valgrind.org
>>>>> on GNU/linux and Apple Mac OS X to find memory corruption errors
>>>>> [1]PETSC ERROR: configure using --with-debugging=yes, recompile, link, and run
>>>>> [1]PETSC ERROR: to get more information on the crash.
>>>>> [1]PETSC ERROR: User provided function() line 0 in unknown directory unknown file (null)
>>>>> [3]PETSC ERROR: ------------------------------------------------------------------------
>>>>> [3]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range
>>>>> [3]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
>>>>> [3]PETSC ERROR: or see
>>>>> http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind[3]PETSC ERROR: or try http://valgrind.org
>>>>> on GNU/linux and Apple Mac OS X to find memory corruption errors
>>>>> [3]PETSC ERROR: configure using --with-debugging=yes, recompile, link, and run
>>>>> [3]PETSC ERROR: to get more information on the crash.
>>>>> [3]PETSC ERROR: User provided function() line 0 in unknown directory unknown file (null)
>>>>>
>>>>> ...
>>>>> Thank you.
>>>>>
>>>>> Yours sincerely,
>>>>>
>>>>> TAY wee-beng
>>>>>
>>>>> On 14/4/2014 9:05 PM, Barry Smith wrote:
>>>>>
>>>>>> Because IO doesn’t always get flushed immediately it may not be hanging at this point. It is better to use the option -start_in_debugger then type cont in each debugger window and then when you think it is “hanging” do a control C in each debugger window and type where to see where each process is you can also look around in the debugger at variables to see why it is “hanging” at that point.
>>>>>>
>>>>>> Barry
>>>>>>
>>>>>> This routines don’t have any parallel communication in them so are unlikely to hang.
>>>>>>
>>>>>> On Apr 14, 2014, at 6:52 AM, TAY wee-beng
>>>>>>
>>>>>> <zonexo at gmail.com>
>>>>>>
>>>>>> wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> My code hangs and I added in mpi_barrier and print to catch the bug. I found that it hangs after printing "7". Is it because I'm doing something wrong? I need to access the u,v,w array so I use DMDAVecGetArrayF90. After access, I use DMDAVecRestoreArrayF90.
>>>>>>>
>>>>>>> call DMDAVecGetArrayF90(da_u,u_local,u_array,ierr)
>>>>>>> call MPI_Barrier(MPI_COMM_WORLD,ierr); if (myid==0) print *,"3"
>>>>>>> call DMDAVecGetArrayF90(da_v,v_local,v_array,ierr)
>>>>>>> call MPI_Barrier(MPI_COMM_WORLD,ierr); if (myid==0) print *,"4"
>>>>>>> call DMDAVecGetArrayF90(da_w,w_local,w_array,ierr)
>>>>>>> call MPI_Barrier(MPI_COMM_WORLD,ierr); if (myid==0) print *,"5"
>>>>>>> call I_IIB_uv_initial_1st_dm(I_cell_no_u1,I_cell_no_v1,I_cell_no_w1,I_cell_u1,I_cell_v1,I_cell_w1,u_array,v_array,w_array)
>>>>>>> call MPI_Barrier(MPI_COMM_WORLD,ierr); if (myid==0) print *,"6"
>>>>>>> call DMDAVecRestoreArrayF90(da_w,w_local,w_array,ierr) !must be in reverse order
>>>>>>> call MPI_Barrier(MPI_COMM_WORLD,ierr); if (myid==0) print *,"7"
>>>>>>> call DMDAVecRestoreArrayF90(da_v,v_local,v_array,ierr)
>>>>>>> call MPI_Barrier(MPI_COMM_WORLD,ierr); if (myid==0) print *,"8"
>>>>>>> call DMDAVecRestoreArrayF90(da_u,u_local,u_array,ierr)
>>>>>>> --
>>>>>>> Thank you.
>>>>>>>
>>>>>>> Yours sincerely,
>>>>>>>
>>>>>>> TAY wee-beng
>>>>>>>
>>>>>>>
>>>>>>>
>>>
>>>
>>> <code.txt>
>
More information about the petsc-users
mailing list