[petsc-users] Usage of parallel FFT for doing batch 1d-FFTs over the columns of a dense 2d-matrix

Roland Richter roland.richter at ntnu.no
Tue Dec 8 08:42:30 CST 2020


I replaced the FFT-code with the following:

/    int FFT_rank = 1;//
//    const int FFTW_size[] = {matrix_size};//
//    int howmany = last_owned_row_index - first_owned_row_index;//
//    int idist = 1;//
//    int odist = 1;//
//    int istride = matrix_size / size;//
//    int ostride = matrix_size / size;//
//    const int inembed[] = {matrix_size}, onembed[] = {matrix_size};//
//    fplan = fftw_plan_many_dft(FFT_rank, FFTW_size,//
//                               howmany,//
//                               data_in, inembed, istride, idist,//
//                               data_out, onembed, ostride, odist,//
//                               FFTW_FORWARD, FFTW_ESTIMATE);//
//    bplan = fftw_plan_many_dft(FFT_rank, FFTW_size,//
//                               howmany,//
//                               data_out, inembed, istride, idist,//
//                               data_out2, onembed, ostride, odist,//
//                               FFTW_BACKWARD, FFTW_ESTIMATE);/

Now I get the expected results also for *mpirun -n x* with x > 1, but
only if my matrix size is an integer multiple of x, else some parts of
the resulting matrix are zeroed out. Therefore, I assume I made a
mistake here with inembed and onembed, but I am not sure how they
influence the result. Do you have any suggestions?

Thanks,

Roland

Am 08.12.20 um 14:55 schrieb Matthew Knepley:
> On Tue, Dec 8, 2020 at 8:40 AM Roland Richter <roland.richter at ntnu.no
> <mailto:roland.richter at ntnu.no>> wrote:
>
>     Dear all,
>
>     I tried the following code:
>
>     /int main(int argc, char **args) {//
>     //    Mat C, F;//
>     //    Vec x, y, z;//
>     //    PetscViewer viewer;//
>     //    PetscMPIInt rank, size;//
>     //    PetscInitialize(&argc, &args, (char*) 0, help);//
>     //
>     //    MPI_Comm_size(PETSC_COMM_WORLD, &size);//
>     //    MPI_Comm_rank(PETSC_COMM_WORLD, &rank);//
>     //
>     //    PetscPrintf(PETSC_COMM_WORLD,"Number of processors = %d,
>     rank = %d\n", size, rank);//
>     //    //    std::cout << "From rank " << rank << '\n';//
>     //
>     //    //MatCreate(PETSC_COMM_WORLD, &C);//
>     //    PetscViewerCreate(PETSC_COMM_WORLD, &viewer);//
>     //    PetscViewerSetType(viewer, PETSCVIEWERASCII);//
>     //    arma::cx_mat local_mat, local_zero_mat;//
>     //    const size_t matrix_size = 5;//
>     //
>     //    MatCreateDense(PETSC_COMM_WORLD, PETSC_DECIDE, PETSC_DECIDE,
>     matrix_size, matrix_size, NULL, &C);//
>     //    MatCreateDense(PETSC_COMM_WORLD, PETSC_DECIDE, PETSC_DECIDE,
>     matrix_size, matrix_size, NULL, &F);//
>     //    if(rank == 0) {//
>     //        arma::Col<int> indices =
>     arma::linspace<arma::Col<int>>(0, matrix_size - 1, matrix_size);//
>     //        //if(rank == 0) {//
>     //        local_mat = arma::randu<arma::cx_mat>(matrix_size,
>     matrix_size);//
>     //        local_zero_mat = arma::zeros<arma::cx_mat>(matrix_size,
>     matrix_size);//
>     //        arma::cx_mat tmp_mat = local_mat.st
>     <http://local_mat.st>();//
>     //        MatSetValues(C, matrix_size, indices.memptr(),
>     matrix_size, indices.memptr(), tmp_mat.memptr(), INSERT_VALUES);//
>     //        MatSetValues(F, matrix_size, indices.memptr(),
>     matrix_size, indices.memptr(), local_zero_mat.memptr(),
>     INSERT_VALUES);//
>     //    }//
>     //
>     //    MatAssemblyBegin(C, MAT_FINAL_ASSEMBLY);//
>     //    MatAssemblyEnd(C, MAT_FINAL_ASSEMBLY);//
>     //    MatAssemblyBegin(F, MAT_FINAL_ASSEMBLY);//
>     //    MatAssemblyEnd(F, MAT_FINAL_ASSEMBLY);//
>     //
>     //    //FFT test//
>     //    Mat FFT_A;//
>     //    Vec input, output;//
>     //    int first_owned_row_index = 0, last_owned_row_index = 0;//
>     //    const int FFT_length[] = {matrix_size};//
>     //
>     //
>     //    MatCreateFFT(PETSC_COMM_WORLD, 1, FFT_length, MATFFTW,
>     &FFT_A);//
>     //    MatCreateVecsFFTW(FFT_A, &x, &y, &z);//
>     //    VecCreate(PETSC_COMM_WORLD, &input);//
>     //    VecSetFromOptions(input);//
>     //    VecSetSizes(input, PETSC_DECIDE, matrix_size);//
>     //    VecCreate(PETSC_COMM_WORLD, &output);//
>     //    VecSetFromOptions(output);//
>     //    VecSetSizes(output, PETSC_DECIDE, matrix_size);//
>     //    MatGetOwnershipRange(C, &first_owned_row_index,
>     &last_owned_row_index);//
>     //    std::cout << "Rank " << rank << " owns row " <<
>     first_owned_row_index << " to row " << last_owned_row_index << '\n';//
>     //
>     //    //Testing FFT//
>     //
>     //    /*---------------------------------------------------------*///
>     //    fftw_plan    fplan,bplan;//
>     //    fftw_complex *data_in,*data_out,*data_out2;//
>     //    ptrdiff_t    alloc_local, local_ni, local_i_start,
>     local_n0,local_0_start;//
>     //    PetscRandom rdm;//
>     //
>     //    //    if (!rank)//
>     //    //        printf("Use FFTW without PETSc-FFTW interface\n");//
>     //    fftw_mpi_init();//
>     //    int N           = matrix_size * matrix_size;//
>     //    int N0 = matrix_size;//
>     //    int N1 = matrix_size;//
>     //    const ptrdiff_t n_data[] = {N0, 1};//
>     //    //alloc_local =
>     fftw_mpi_local_size_2d(N0,N1,PETSC_COMM_WORLD,&local_n0,&local_0_start);//
>     //    alloc_local = fftw_mpi_local_size_many(1, n_data,//
>     //                                           matrix_size,//
>     //                                           FFTW_MPI_DEFAULT_BLOCK,//
>     //                                           PETSC_COMM_WORLD,//
>     //                                           &local_n0,//
>     //                                           &local_0_start);//
>     //    //data_in   =
>     (fftw_complex*)fftw_malloc(sizeof(fftw_complex)*alloc_local);//
>     //    PetscScalar *C_ptr, *F_ptr;//
>     //    MatDenseGetArray(C, &C_ptr);//
>     //    MatDenseGetArray(F, &F_ptr);//
>     //    data_in = reinterpret_cast<fftw_complex*>(C_ptr);//
>     //    data_out = reinterpret_cast<fftw_complex*>(F_ptr);//
>     //    data_out2 =
>     (fftw_complex*)fftw_malloc(sizeof(fftw_complex)*alloc_local);//
>     //
>     //
>     //    VecCreateMPIWithArray(PETSC_COMM_WORLD,1,(PetscInt)local_n0
>     * N1,(PetscInt)N,(const PetscScalar*)data_in,&x);//
>     //    PetscObjectSetName((PetscObject) x, "Real Space vector");//
>     //    VecCreateMPIWithArray(PETSC_COMM_WORLD,1,(PetscInt)local_n0
>     * N1,(PetscInt)N,(const PetscScalar*)data_out,&y);//
>     //    PetscObjectSetName((PetscObject) y, "Frequency space vector");//
>     //    VecCreateMPIWithArray(PETSC_COMM_WORLD,1,(PetscInt)local_n0
>     * N1,(PetscInt)N,(const PetscScalar*)data_out2,&z);//
>     //    PetscObjectSetName((PetscObject) z, "Reconstructed vector");//
>     //
>     //    int FFT_rank = 1;//
>     //    const ptrdiff_t FFTW_size[] = {matrix_size};//
>     //    int howmany = last_owned_row_index - first_owned_row_index;//
>     //    //std::cout << "Rank " << rank << " processes " << howmany
>     << " rows\n";//
>     //    int idist = matrix_size;//1;//
>     //    int odist = matrix_size;//1;//
>     //    int istride = 1;//matrix_size;//
>     //    int ostride = 1;//matrix_size;//
>     //    const ptrdiff_t *inembed = FFTW_size, *onembed = FFTW_size;//
>     //    fplan = fftw_mpi_plan_many_dft(FFT_rank, FFTW_size,//
>     //                                   howmany,//
>     //                                   FFTW_MPI_DEFAULT_BLOCK,
>     FFTW_MPI_DEFAULT_BLOCK,//
>     //                                   data_in, data_out,//
>     //                                   PETSC_COMM_WORLD,//
>     //                                   FFTW_FORWARD, FFTW_ESTIMATE);//
>     //    bplan = fftw_mpi_plan_many_dft(FFT_rank, FFTW_size,//
>     //                                   howmany,//
>     //                                   FFTW_MPI_DEFAULT_BLOCK,
>     FFTW_MPI_DEFAULT_BLOCK,//
>     //                                   data_out, data_out2,//
>     //                                   PETSC_COMM_WORLD,//
>     //                                   FFTW_BACKWARD, FFTW_ESTIMATE);//
>     //
>     //    if (false) {VecView(x,PETSC_VIEWER_STDOUT_WORLD);}//
>     //
>     //    fftw_execute(fplan);//
>     //    if (false) {VecView(y,PETSC_VIEWER_STDOUT_WORLD);}//
>     //
>     //    fftw_execute(bplan);//
>     //
>     //    double a = 1.0 / matrix_size;//
>     //    double enorm = 0;//
>     //    VecScale(z,a);//
>     //    if (false) {VecView(z, PETSC_VIEWER_STDOUT_WORLD);}//
>     //    VecAXPY(z,-1.0,x);//
>     //    VecNorm(z,NORM_1,&enorm);//
>     //    if (enorm > 1.e-11) {//
>     //        PetscPrintf(PETSC_COMM_SELF,"  Error norm of |x - z|
>     %g\n",(double)enorm);//
>     //    }//
>     //
>     //    /* Free spaces *///
>     //    fftw_destroy_plan(fplan);//
>     //    fftw_destroy_plan(bplan);//
>     //    fftw_free(data_out2);//
>     //
>     //    //Generate test matrix for comparison//
>     //    arma::cx_mat fft_test_mat = local_mat;//
>     //    fft_test_mat.each_row([&](arma::cx_rowvec &a){//
>     //        a = arma::fft(a);//
>     //    });//
>     //    std::cout <<
>     "-----------------------------------------------------\n";//
>     //    std::cout << "Input matrix:\n" << local_mat << '\n';//
>     //    MatView(C, viewer);//
>     //    std::cout <<
>     "-----------------------------------------------------\n";//
>     //    std::cout << "Expected output matrix:\n" << fft_test_mat <<
>     '\n';//
>     //    MatView(F, viewer);//
>     //    std::cout <<
>     "-----------------------------------------------------\n";//
>     //    MatDestroy(&FFT_A);//
>     //    VecDestroy(&input);//
>     //    VecDestroy(&output);//
>     //    VecDestroy(&x);//
>     //    VecDestroy(&y);//
>     //    VecDestroy(&z);//
>     //    MatDestroy(&C);//
>     //    MatDestroy(&F);//
>     //    PetscViewerDestroy(&viewer);//
>     //    PetscFinalize();//
>     //    return 0;//
>     //}/
>
>     For *mpirun -n 1* I get the expected output (i.e. armadillo and
>     PETSc/FFTW return the same result), but for *mpirun -n x* with x >
>     1 every value which is not assigned to rank 0 is lost and set to
>     zero instead. Every value assigned to rank 0 is calculated
>     correctly, as far as I can see. Did I forget something here?
>
> I do not understand why your FFTW calls use the WORLD communicator.
> Aren't they serial FFTs over the local rows?
>
>   THanks,
>
>      Matt
>
>     Thanks,
>
>     Roland
>
>     Am 05.12.20 um 01:59 schrieb Barry Smith:
>>
>>       Roland,
>>
>>         If you store your matrix as described in a parallel PETSc
>>     dense matrix then you should be able to call 
>>
>>     fftw_plan_many_dft() directly on the value obtained with
>>     MatDenseGetArray(). You just need to pass the arguments regarding
>>     column major ordering appropriately. Probably identically to what
>>     you do with your previous code.
>>
>>        Barry
>>
>>
>>>     On Dec 4, 2020, at 6:47 AM, Roland Richter
>>>     <roland.richter at ntnu.no <mailto:roland.richter at ntnu.no>> wrote:
>>>
>>>     Ideally those FFTs could be handled in parallel, after they are
>>>     not depending on each other. Is that possible with MatFFT, or
>>>     should I rather use FFTW for that?
>>>
>>>     Thanks,
>>>
>>>     Roland
>>>
>>>     Am 04.12.20 um 13:19 schrieb Matthew Knepley:
>>>>     On Fri, Dec 4, 2020 at 5:32 AM Roland Richter
>>>>     <roland.richter at ntnu.no <mailto:roland.richter at ntnu.no>> wrote:
>>>>
>>>>         Hei,
>>>>
>>>>         I am currently working on a problem which requires a large
>>>>         amount of
>>>>         transformations of a field E(r, t) from time space to
>>>>         Fourier space E(r,
>>>>         w) and back. The field is described in a 2d-matrix, with
>>>>         the r-dimension
>>>>         along the columns and the t-dimension along the rows.
>>>>
>>>>         For the transformation from time to frequency space and
>>>>         back I therefore
>>>>         have to apply a 1d-FFT operation over each row of my
>>>>         matrix. For my
>>>>         earlier attempts I used armadillo as matrix library and
>>>>         FFTW for doing
>>>>         the transformations. Here I could use fftw_plan_many_dft to
>>>>         do all FFTs
>>>>         at the same time. Unfortunately, armadillo does not support
>>>>         MPI, and
>>>>         therefore I had to switch to PETSc for larger matrices.
>>>>
>>>>         Based on the examples (such as example 143) PETSc has a way
>>>>         of doing
>>>>         FFTs internally by creating an FFT object (using MatCreateFFT).
>>>>         Unfortunately, I can not see how I could use that object to
>>>>         conduct the
>>>>         operation described above without having to iterate over
>>>>         each row in my
>>>>         original matrix (i.e. doing it sequential, not in parallel).
>>>>
>>>>         Ideally I could distribute the FFTs such over my nodes that
>>>>         each node
>>>>         takes several rows of the original matrix and applies the
>>>>         FFT to each of
>>>>         them. As example, for a matrix with a size of 4x4 and two
>>>>         nodes node 0
>>>>         would take row 0 and 1, while node 1 takes row 2 and 3, to
>>>>         avoid
>>>>         unnecessary memory transfer between the nodes while
>>>>         conducting the FFTs.
>>>>         Is that something PETSc can do, too?
>>>>
>>>>
>>>>     The way I understand our setup (I did not write it), we use
>>>>     plan_many_dft to handle
>>>>     multiple dof FFTs, but these would be interlaced. You want many
>>>>     FFTs for non-interlaced
>>>>     storage, which is not something we do right now. You could
>>>>     definitely call FFTW directly
>>>>     if you want.
>>>>
>>>>     Second, above it seems like you just want serial FFTs. You can
>>>>     definitely create a MatFFT
>>>>     with PETSC_COMM_SELF, and apply it to each row in the local
>>>>     rows, or create the plan
>>>>     yourself for the stack of rows.
>>>>
>>>>        Thanks,
>>>>
>>>>          Matt
>>>>      
>>>>
>>>>         Thanks!
>>>>
>>>>         Regards,
>>>>
>>>>         Roland
>>>>
>>>>
>>>>
>>>>     -- 
>>>>     What most experimenters take for granted before they begin
>>>>     their experiments is infinitely more interesting than any
>>>>     results to which their experiments lead.
>>>>     -- Norbert Wiener
>>>>
>>>>     https://www.cse.buffalo.edu/~knepley/
>>>>     <http://www.cse.buffalo.edu/~knepley/>
>>
>
>
> -- 
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which
> their experiments lead.
> -- Norbert Wiener
>
> https://www.cse.buffalo.edu/~knepley/
> <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20201208/779b8bc7/attachment-0001.html>


More information about the petsc-users mailing list