[petsc-users] Code hangs when calling PetscIntView (MPI, fortran)
dazza simplythebest
sayosale at hotmail.com
Thu May 20 03:25:41 CDT 2021
Dear All,
As part of preparing a code to call the SLEPC eigenvalue solving library,
I am constructing a matrix in sparse CSR format row-by-row. Just for debugging
purposes I write out the column values for a given row, which are stored in a
PetscInt allocatable vector, using PetscIntView.
Everything works fine when the number of MPI processes exactly divide the
number of rows of the matrix, and so each process owns the same number of rows.
However, when the number of MPI processes does not exactly divide the
number of rows of the matrix, and so each process owns a different number of rows,
the code hangs when it reaches the line that calls PetscIntView.
To be precise the code hangs on the final row that a process, other than root, owns.
If I however comment out the call to PetscIntView the code completes without error,
and produces the correct eigenvalues (hence we are not missing a row / miswriting a row).
Note also that a simple direct writeout of this same array using a plain fortran command
will write out the array without problem.
I have attached below a small code that reproduces the problem.
For this code we have nominally assigned 200 rows to our matrix. The code runs without
problem using 1,2,4,5,8 or 10 MPI processes, all of which precisely divide 200,
but will hang for 3 MPI processes for example.
For the case of 3 MPI processes the subroutine WHOSE_ROW_IS_IT allocates the rows
to each process as :
process no first row last row no. of rows
0 1 66 66
1 67 133 67
2 134 200 67
The code will hang when process 1 calls PetscIntView for its last row, row 133 for example.
One piece of additional information that may be relevant is that the code does run to completion
without hanging if I comment out the final slepc/MPI finalisation command
CALL SlepcFinalize(ierr_pets)
(I of course I get ' bad termination' errors, but the otherwise the run is successful.)
I would appreciate it if anyone has any ideas on what is going wrong!
Many thanks,
Dan.
code:
MODULE ALL_STAB_ROUTINES
IMPLICIT NONE
CONTAINS
SUBROUTINE WHOSE_ROW_IS_IT(ROW_NO, TOTAL_NO_ROWS, NO_PROCESSES, &
& OWNER)
! THIS ROUTINE ALLOCATES ROWS EVENLY BETWEEN mpi PROCESSES
#include <slepc/finclude/slepceps.h>
use slepceps
IMPLICIT NONE
PetscInt, INTENT(IN) :: ROW_NO, TOTAL_NO_ROWS, NO_PROCESSES
PetscInt, INTENT(OUT) :: OWNER
PetscInt :: P, REM
P = TOTAL_NO_ROWS / NO_PROCESSES ! NOTE INTEGER DIVISION
REM = TOTAL_NO_ROWS - P*NO_PROCESSES
IF (ROW_NO < (NO_PROCESSES - REM)*P + 1 ) THEN
OWNER = (ROW_NO - 1)/P ! NOTE INTEGER DIVISION
ELSE
OWNER = ( ROW_NO + NO_PROCESSES - REM -1 )/(P+1) ! NOTE INTEGER DIVISION
ENDIF
END SUBROUTINE WHOSE_ROW_IS_IT
END MODULE ALL_STAB_ROUTINES
PROGRAM trialer
USE MPI
#include <slepc/finclude/slepceps.h>
use slepceps
USE ALL_STAB_ROUTINES
IMPLICIT NONE
PetscMPIInt rank3, total_mpi_size
PetscInt nl3, code, PROC_ROW, ISTATUS, jm, N_rows,NO_A_ENTRIES
PetscInt, ALLOCATABLE, DIMENSION(:) :: JALOC
PetscInt, PARAMETER :: ZERO = 0 , ONE = 1, TWO = 2, THREE = 3
PetscErrorCode ierr_pets
! Initialise sleps/mpi
call SlepcInitialize(PETSC_NULL_CHARACTER,ierr_pets) ! note that this initialises MPI
call MPI_COMM_SIZE(MPI_COMM_WORLD, total_mpi_size, ierr_pets) !! find total no of MPI processes
nL3= total_mpi_size
call MPI_COMM_RANK(MPI_COMM_WORLD,rank3,ierr_pets) !! find my overall rank -> rank3
write(*,*)'Welcome: PROCESS NO , TOTAL NO. OF PROCESSES = ',rank3, nl3
N_rows = 200 ! NUMBER OF ROWS OF A NOTIONAL MATRIX
NO_A_ENTRIES = 12 ! NUMBER OF ENTRIES FOR JALOC
! LOOP OVER ROWS
do jm = 1, N_rows
CALL whose_row_is_it(JM, N_rows , NL3, PROC_ROW) ! FIND OUT WHICH PROCESS OWNS ROW
if (rank3 == PROC_ROW) then ! IF mpi PROCESS OWNS THIS ROW THEN ..
! ALLOCATE jaloc ARRAY AND INITIALISE
allocate(jaloc(NO_A_ENTRIES), STAT=ISTATUS )
jaloc = three
WRITE(*,*)'JALOC',JALOC ! THIS SIMPLE PLOT ALWAYS WORKS
write(*,*)'calling PetscIntView: PROCESS NO. ROW NO.',rank3, jm
! THIS CALL TO PetscIntView CAUSES CODE TO HANG WHEN E.G. total_mpi_size=3, JM=133
call PetscIntView(NO_A_ENTRIES,JALOC(1:NO_A_ENTRIES), &
& PETSC_VIEWER_STDOUT_WORLD, ierr_pets)
CHKERRA(ierr_pets)
deallocate(jaloc)
endif
enddo
CALL SlepcFinalize(ierr_pets)
end program trialer
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210520/83d20ce1/attachment-0001.html>
More information about the petsc-users
mailing list