[petsc-users] slepc NHEP error

Kannan, Ramakrishnan kannanr at ornl.gov
Thu Jun 15 12:41:25 CDT 2017


Barry/Jose,

It is uniform random synthetic matrix. All the processors have same number of columns. Here is the code snippet for matrix construction.

  Generate_petsc_mat(int n_rows, int n_cols, int n_nnz,
                PetscInt *row_idx, PetscInt *col_idx, PetscScalar *val,
                const MPICommunicator& communicator):
{
//compute matrix statistics. Use for matsetvalues
    int *start_row = new int[communicator.size()];
    int *all_proc_rows = new int[communicator.size()];
    int global_rows = 0;
    MPI_Allgather(&n_rows, 1, MPI_INT, all_proc_rows, 1, MPI_INT, MPI_COMM_WORLD);
    start_row[0] = 0;
    for (int i = 0; i < communicator.size(); i++) {
      if (i > 0) {
        start_row[i] = start_row[i - 1] + all_proc_rows[i];
      }
      global_rows += all_proc_rows[i];
    }

    // find the max nnzs per row for preallocation.
    int *nnzs_per_rows = new int[n_rows];
    for (int i = 0; i < n_rows; i++) {
      nnzs_per_rows[i] = 0;
    }
    for (int i = 0; i < n_nnz; i++) {
      nnzs_per_rows[row_idx[i]]++;
    }
    int max_nnz_per_row = -1;
    for (int i = 0; i < n_rows; i++) {
      if (nnzs_per_rows[i] > max_nnz_per_row) {
        max_nnz_per_row = nnzs_per_rows[i];
      }
    }
    int max_cols;
    MPI_Allreduce(&n_cols,&max_cols,1,MPI_INT,MPI_MAX,MPI_COMM_WORLD);
DISTPRINTINFO("rows,cols,nnzs, max_nnz_per_row" << n_rows  << ","
                  << n_cols << ","  << n_nnz << max_nnz_per_row);
    MatCreate(PETSC_COMM_WORLD, &A);
    MatSetType(A, MATMPIAIJ);
    MatSetSizes(A, n_rows, PETSC_DECIDE, global_rows, max_cols);
    MatMPIAIJSetPreallocation(A, max_nnz_per_row, PETSC_NULL, max_nnz_per_row, PETSC_NULL);

    PetscInt local_row_idx;
    PetscInt local_col_idx;
    PetscScalar local_val;

    int my_start_row = start_row[MPI_RANK];
int my_start_col = 0;
double petsc_insert_time = 0.0;
    for (int i = 0; i < n_nnz; i++) {
      local_row_idx = my_start_row + row_idx[i];
      local_col_idx = my_start_col + col_idx[i];
      local_val = val[i];
#ifdef TRICOUNT_VERBOSE
      DISTPRINTINFO(local_row_idx << "," << local_col_idx << "," << local_val);
#endif
      tic();
      ierr = MatSetValues(A, 1, &local_row_idx, 1, &local_col_idx, &local_val, INSERT_VALUES);
      petsc_insert_time += toc();
      if (i % 25000 == 0) {
        PRINTROOT("25000 time::" << petsc_insert_time);
        petsc_insert_time = 0;
      }
      CHKERRV(ierr);
    }
    PRINTROOT("100000 time::" << petsc_insert_time);
    tic();
    MatAssemblyBegin(A, MAT_FINAL_ASSEMBLY);
    MatAssemblyEnd(A, MAT_FINAL_ASSEMBLY);
    petsc_insert_time = toc();
    PRINTROOT("calling assembly to end::took::" << petsc_insert_time);
    PetscPrintf(PETSC_COMM_WORLD, "Matrix created\n");

-- 
Regards,
Ramki
 

On 6/15/17, 1:35 PM, "Barry Smith" <bsmith at mcs.anl.gov> wrote:

    
    > On Jun 15, 2017, at 12:31 PM, Jose E. Roman <jroman at dsic.upv.es> wrote:
    > 
    >> 
    >> El 15 jun 2017, a las 19:13, Barry Smith <bsmith at mcs.anl.gov> escribió:
    >> 
    >> 
    >>> On Jun 15, 2017, at 9:51 AM, Jose E. Roman <jroman at dsic.upv.es> wrote:
    >>> 
    >>> 
    >>>> El 15 jun 2017, a las 16:18, Kannan, Ramakrishnan <kannanr at ornl.gov> escribió:
    >>>> 
    >>>> I made the advised changes and rebuilt slepc. I ran and the error still exists. Attached are the error file and the modified source file bvblas.c. 
    >>> 
    >>> This is really weird. It seems that at some point the number of columns of the BV object differs by one in different MPI processes. The only explanation I can think of is that a threaded BLAS/LAPACK is giving slightly different results in each process.
    
       So where in the code is the decision on how many columns to use made? If we look at that it might help see why it could ever produce different results on different processes.
    
    
    >> 
    >> 
    >>  Jose,
    >> 
    >>    Do you have a local calculation that generates the number of columns seperately on each process with the assumption that the result will be the same on all processes? Where is that code? You may need a global reduction where the processes "negotiate" what the number of columns should be after they do the local computation, for example take the maximum (or min or average) produced by all the processes.
    >> 
    >>  Barry
    >> 
    > 
    > My comment is related to the convergence criterion, which is based on a call to LAPACK (it is buried in the DS object, no clear spot in the code). This is done this way in SLEPc for 15+ years, and no one has complained. So maybe it is not what is causing this problem. The thing is that I do not have access to these big machines, with Cray libraries etc., so I cannot reproduce the problem and am just suggesting things blindly.
    > 
    > Jose
    > 
    > 
    >> 
    >> 
    >> 
    >>> Which BLAS/LAPACK do you have? Can you run with threads turned off? An alternative would be to configure PETSc with --download-fblaslapack  (or --download-f2cblaslapack)
    >>> 
    >>> Jose
    
    
    



More information about the petsc-users mailing list