[petsc-users] Code hangs when calling PetscIntView (MPI, fortran)

dazza simplythebest sayosale at hotmail.com
Thu May 20 03:25:41 CDT 2021


Dear All,
             As part of preparing a code to call the SLEPC eigenvalue solving library,
I am constructing a matrix in sparse CSR format row-by-row. Just for debugging
purposes I write out the column values for a given row, which are stored in a
PetscInt allocatable vector, using PetscIntView.

Everything works fine when the number of MPI processes exactly divide the
number of rows of the matrix, and so each process owns the same number of rows.
However, when the number of MPI processes does not exactly divide the
number of rows of the matrix, and so each process owns a different number of rows,
the code hangs when it reaches the line that calls PetscIntView.
To be precise the code hangs on the final row that a process, other than root, owns.
If I however comment out the call to PetscIntView the code completes without error,
 and produces the correct eigenvalues (hence we are not missing a row / miswriting a row).
   Note also that a simple direct writeout of this same array using a plain fortran command
will write out the array without problem.

I have attached below a small code that reproduces the problem.
For this code we have nominally assigned 200 rows to our matrix. The code runs without
problem using 1,2,4,5,8 or 10 MPI processes, all of which precisely divide 200,
 but will hang for 3 MPI processes for example.
For the case of 3 MPI processes the subroutine WHOSE_ROW_IS_IT allocates the rows
to each process as :
  process no       first row           last row       no. of rows
   0                             1                     66               66
   1                            67                   133             67
   2                          134                   200             67

The code will hang when process 1 calls PetscIntView for its last row, row 133 for example.

One piece of additional information that may be relevant is that the code does run to completion
 without hanging if I comment out the final slepc/MPI finalisation command
 CALL SlepcFinalize(ierr_pets)
(I of course I get ' bad termination' errors, but the otherwise the run is successful.)

 I would appreciate it if anyone has any ideas on what is going wrong!
  Many thanks,
                       Dan.


code:

      MODULE ALL_STAB_ROUTINES
      IMPLICIT NONE
      CONTAINS

      SUBROUTINE WHOSE_ROW_IS_IT(ROW_NO, TOTAL_NO_ROWS, NO_PROCESSES,     &
     &      OWNER)
!     THIS ROUTINE ALLOCATES ROWS EVENLY BETWEEN mpi PROCESSES
#include <slepc/finclude/slepceps.h>
      use slepceps
      IMPLICIT NONE
      PetscInt, INTENT(IN) :: ROW_NO, TOTAL_NO_ROWS, NO_PROCESSES
      PetscInt, INTENT(OUT) :: OWNER
      PetscInt :: P, REM

      P = TOTAL_NO_ROWS / NO_PROCESSES ! NOTE INTEGER DIVISION
      REM = TOTAL_NO_ROWS - P*NO_PROCESSES
      IF (ROW_NO < (NO_PROCESSES - REM)*P + 1 ) THEN
        OWNER = (ROW_NO - 1)/P ! NOTE INTEGER DIVISION
      ELSE
        OWNER = (  ROW_NO  +   NO_PROCESSES - REM -1 )/(P+1) ! NOTE INTEGER DIVISION
      ENDIF
      END SUBROUTINE WHOSE_ROW_IS_IT
      END MODULE ALL_STAB_ROUTINES


      PROGRAM trialer
      USE MPI
#include <slepc/finclude/slepceps.h>
      use slepceps
      USE ALL_STAB_ROUTINES
      IMPLICIT NONE
      PetscMPIInt    rank3, total_mpi_size
      PetscInt nl3, code,  PROC_ROW, ISTATUS, jm, N_rows,NO_A_ENTRIES
      PetscInt, ALLOCATABLE, DIMENSION(:) :: JALOC
      PetscInt, PARAMETER  ::  ZERO = 0 , ONE = 1, TWO = 2, THREE = 3
      PetscErrorCode ierr_pets

! Initialise sleps/mpi
      call SlepcInitialize(PETSC_NULL_CHARACTER,ierr_pets) ! note that this initialises MPI
      call MPI_COMM_SIZE(MPI_COMM_WORLD, total_mpi_size, ierr_pets) !! find total no of MPI processes
      nL3= total_mpi_size
      call MPI_COMM_RANK(MPI_COMM_WORLD,rank3,ierr_pets) !! find my overall rank -> rank3
      write(*,*)'Welcome: PROCESS NO , TOTAL NO. OF PROCESSES =  ',rank3, nl3

      N_rows = 200 ! NUMBER OF ROWS OF A NOTIONAL MATRIX
      NO_A_ENTRIES = 12 ! NUMBER OF ENTRIES FOR JALOC

!     LOOP OVER ROWS
      do jm = 1, N_rows

      CALL whose_row_is_it(JM,  N_rows , NL3, PROC_ROW) ! FIND OUT WHICH PROCESS OWNS ROW
      if (rank3 == PROC_ROW) then ! IF mpi PROCESS OWNS THIS ROW THEN ..
!       ALLOCATE jaloc ARRAY AND INITIALISE

        allocate(jaloc(NO_A_ENTRIES), STAT=ISTATUS )
        jaloc = three


        WRITE(*,*)'JALOC',JALOC ! THIS SIMPLE PLOT ALWAYS WORKS
        write(*,*)'calling PetscIntView: PROCESS NO. ROW NO.',rank3, jm
        ! THIS CALL TO PetscIntView CAUSES CODE TO HANG WHEN E.G. total_mpi_size=3, JM=133
        call PetscIntView(NO_A_ENTRIES,JALOC(1:NO_A_ENTRIES),              &
     &       PETSC_VIEWER_STDOUT_WORLD, ierr_pets)
        CHKERRA(ierr_pets)
        deallocate(jaloc)
      endif
      enddo

      CALL SlepcFinalize(ierr_pets)
      end program trialer

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210520/83d20ce1/attachment-0001.html>


More information about the petsc-users mailing list