[petsc-users] Usage of parallel FFT for doing batch 1d-FFTs over the columns of a dense 2d-matrix

Matthew Knepley knepley at gmail.com
Tue Dec 8 09:34:03 CST 2020


On Tue, Dec 8, 2020 at 9:42 AM Roland Richter <roland.richter at ntnu.no>
wrote:

> I replaced the FFT-code with the following:
>
> *    int FFT_rank = 1;*
> *    const int FFTW_size[] = {matrix_size};*
> *    int howmany = last_owned_row_index - first_owned_row_index;*
> *    int idist = 1;*
> *    int odist = 1;*
> *    int istride = matrix_size / size;*
> *    int ostride = matrix_size / size;*
> *    const int inembed[] = {matrix_size}, onembed[] = {matrix_size};*
> *    fplan = fftw_plan_many_dft(FFT_rank, FFTW_size,*
> *                               howmany,*
> *                               data_in, inembed, istride, idist,*
> *                               data_out, onembed, ostride, odist,*
> *                               FFTW_FORWARD, FFTW_ESTIMATE);*
> *    bplan = fftw_plan_many_dft(FFT_rank, FFTW_size,*
> *                               howmany,*
> *                               data_out, inembed, istride, idist,*
> *                               data_out2, onembed, ostride, odist,*
> *                               FFTW_BACKWARD, FFTW_ESTIMATE);*
>
> Now I get the expected results also for *mpirun -n x* with x > 1, but
> only if my matrix size is an integer multiple of x, else some parts of the
> resulting matrix are zeroed out. Therefore, I assume I made a mistake here
> with inembed and onembed, but I am not sure how they influence the result.
> Do you have any suggestions?
>
> I don't know what those arguments mean.

  Thanks,

     Matt

> Thanks,
>
> Roland
> Am 08.12.20 um 14:55 schrieb Matthew Knepley:
>
> On Tue, Dec 8, 2020 at 8:40 AM Roland Richter <roland.richter at ntnu.no>
> wrote:
>
>> Dear all,
>>
>> I tried the following code:
>>
>> *int main(int argc, char **args) {*
>> *    Mat C, F;*
>> *    Vec x, y, z;*
>> *    PetscViewer viewer;*
>> *    PetscMPIInt rank, size;*
>> *    PetscInitialize(&argc, &args, (char*) 0, help);*
>>
>> *    MPI_Comm_size(PETSC_COMM_WORLD, &size);*
>> *    MPI_Comm_rank(PETSC_COMM_WORLD, &rank);*
>>
>> *    PetscPrintf(PETSC_COMM_WORLD,"Number of processors = %d, rank =
>> %d\n", size, rank);*
>> *    //    std::cout << "From rank " << rank << '\n';*
>>
>> *    //MatCreate(PETSC_COMM_WORLD, &C);*
>> *    PetscViewerCreate(PETSC_COMM_WORLD, &viewer);*
>> *    PetscViewerSetType(viewer, PETSCVIEWERASCII);*
>> *    arma::cx_mat local_mat, local_zero_mat;*
>> *    const size_t matrix_size = 5;*
>>
>> *    MatCreateDense(PETSC_COMM_WORLD, PETSC_DECIDE, PETSC_DECIDE,
>> matrix_size, matrix_size, NULL, &C);*
>> *    MatCreateDense(PETSC_COMM_WORLD, PETSC_DECIDE, PETSC_DECIDE,
>> matrix_size, matrix_size, NULL, &F);*
>> *    if(rank == 0) {*
>> *        arma::Col<int> indices = arma::linspace<arma::Col<int>>(0,
>> matrix_size - 1, matrix_size);*
>> *        //if(rank == 0) {*
>> *        local_mat = arma::randu<arma::cx_mat>(matrix_size, matrix_size);*
>> *        local_zero_mat = arma::zeros<arma::cx_mat>(matrix_size,
>> matrix_size);*
>> *        arma::cx_mat tmp_mat = local_mat.st <http://local_mat.st>();*
>> *        MatSetValues(C, matrix_size, indices.memptr(), matrix_size,
>> indices.memptr(), tmp_mat.memptr(), INSERT_VALUES);*
>> *        MatSetValues(F, matrix_size, indices.memptr(), matrix_size,
>> indices.memptr(), local_zero_mat.memptr(), INSERT_VALUES);*
>> *    }*
>>
>> *    MatAssemblyBegin(C, MAT_FINAL_ASSEMBLY);*
>> *    MatAssemblyEnd(C, MAT_FINAL_ASSEMBLY);*
>> *    MatAssemblyBegin(F, MAT_FINAL_ASSEMBLY);*
>> *    MatAssemblyEnd(F, MAT_FINAL_ASSEMBLY);*
>>
>> *    //FFT test*
>> *    Mat FFT_A;*
>> *    Vec input, output;*
>> *    int first_owned_row_index = 0, last_owned_row_index = 0;*
>> *    const int FFT_length[] = {matrix_size};*
>>
>>
>> *    MatCreateFFT(PETSC_COMM_WORLD, 1, FFT_length, MATFFTW, &FFT_A);*
>> *    MatCreateVecsFFTW(FFT_A, &x, &y, &z);*
>> *    VecCreate(PETSC_COMM_WORLD, &input);*
>> *    VecSetFromOptions(input);*
>> *    VecSetSizes(input, PETSC_DECIDE, matrix_size);*
>> *    VecCreate(PETSC_COMM_WORLD, &output);*
>> *    VecSetFromOptions(output);*
>> *    VecSetSizes(output, PETSC_DECIDE, matrix_size);*
>> *    MatGetOwnershipRange(C, &first_owned_row_index,
>> &last_owned_row_index);*
>> *    std::cout << "Rank " << rank << " owns row " <<
>> first_owned_row_index << " to row " << last_owned_row_index << '\n';*
>>
>> *    //Testing FFT*
>>
>> *    /*---------------------------------------------------------*/*
>> *    fftw_plan    fplan,bplan;*
>> *    fftw_complex *data_in,*data_out,*data_out2;*
>> *    ptrdiff_t    alloc_local, local_ni, local_i_start,
>> local_n0,local_0_start;*
>> *    PetscRandom rdm;*
>>
>> *    //    if (!rank)*
>> *    //        printf("Use FFTW without PETSc-FFTW interface\n");*
>> *    fftw_mpi_init();*
>> *    int N           = matrix_size * matrix_size;*
>> *    int N0 = matrix_size;*
>> *    int N1 = matrix_size;*
>> *    const ptrdiff_t n_data[] = {N0, 1};*
>> *    //alloc_local =
>> fftw_mpi_local_size_2d(N0,N1,PETSC_COMM_WORLD,&local_n0,&local_0_start);*
>> *    alloc_local = fftw_mpi_local_size_many(1, n_data,*
>> *                                           matrix_size,*
>> *                                           FFTW_MPI_DEFAULT_BLOCK,*
>> *                                           PETSC_COMM_WORLD,*
>> *                                           &local_n0,*
>> *                                           &local_0_start);*
>> *    //data_in   =
>> (fftw_complex*)fftw_malloc(sizeof(fftw_complex)*alloc_local);*
>> *    PetscScalar *C_ptr, *F_ptr;*
>> *    MatDenseGetArray(C, &C_ptr);*
>> *    MatDenseGetArray(F, &F_ptr);*
>> *    data_in = reinterpret_cast<fftw_complex*>(C_ptr);*
>> *    data_out = reinterpret_cast<fftw_complex*>(F_ptr);*
>> *    data_out2 =
>> (fftw_complex*)fftw_malloc(sizeof(fftw_complex)*alloc_local);*
>>
>>
>> *    VecCreateMPIWithArray(PETSC_COMM_WORLD,1,(PetscInt)local_n0 *
>> N1,(PetscInt)N,(const PetscScalar*)data_in,&x);*
>> *    PetscObjectSetName((PetscObject) x, "Real Space vector");*
>> *    VecCreateMPIWithArray(PETSC_COMM_WORLD,1,(PetscInt)local_n0 *
>> N1,(PetscInt)N,(const PetscScalar*)data_out,&y);*
>> *    PetscObjectSetName((PetscObject) y, "Frequency space vector");*
>> *    VecCreateMPIWithArray(PETSC_COMM_WORLD,1,(PetscInt)local_n0 *
>> N1,(PetscInt)N,(const PetscScalar*)data_out2,&z);*
>> *    PetscObjectSetName((PetscObject) z, "Reconstructed vector");*
>>
>> *    int FFT_rank = 1;*
>> *    const ptrdiff_t FFTW_size[] = {matrix_size};*
>> *    int howmany = last_owned_row_index - first_owned_row_index;*
>> *    //std::cout << "Rank " << rank << " processes " << howmany << "
>> rows\n";*
>> *    int idist = matrix_size;//1;*
>> *    int odist = matrix_size;//1;*
>> *    int istride = 1;//matrix_size;*
>> *    int ostride = 1;//matrix_size;*
>> *    const ptrdiff_t *inembed = FFTW_size, *onembed = FFTW_size;*
>> *    fplan = fftw_mpi_plan_many_dft(FFT_rank, FFTW_size,*
>> *                                   howmany,*
>> *                                   FFTW_MPI_DEFAULT_BLOCK,
>> FFTW_MPI_DEFAULT_BLOCK,*
>> *                                   data_in, data_out,*
>> *                                   PETSC_COMM_WORLD,*
>> *                                   FFTW_FORWARD, FFTW_ESTIMATE);*
>> *    bplan = fftw_mpi_plan_many_dft(FFT_rank, FFTW_size,*
>> *                                   howmany,*
>> *                                   FFTW_MPI_DEFAULT_BLOCK,
>> FFTW_MPI_DEFAULT_BLOCK,*
>> *                                   data_out, data_out2,*
>> *                                   PETSC_COMM_WORLD,*
>> *                                   FFTW_BACKWARD, FFTW_ESTIMATE);*
>>
>> *    if (false) {VecView(x,PETSC_VIEWER_STDOUT_WORLD);}*
>>
>> *    fftw_execute(fplan);*
>> *    if (false) {VecView(y,PETSC_VIEWER_STDOUT_WORLD);}*
>>
>> *    fftw_execute(bplan);*
>>
>> *    double a = 1.0 / matrix_size;*
>> *    double enorm = 0;*
>> *    VecScale(z,a);*
>> *    if (false) {VecView(z, PETSC_VIEWER_STDOUT_WORLD);}*
>> *    VecAXPY(z,-1.0,x);*
>> *    VecNorm(z,NORM_1,&enorm);*
>> *    if (enorm > 1.e-11) {*
>> *        PetscPrintf(PETSC_COMM_SELF,"  Error norm of |x - z|
>> %g\n",(double)enorm);*
>> *    }*
>>
>> *    /* Free spaces */*
>> *    fftw_destroy_plan(fplan);*
>> *    fftw_destroy_plan(bplan);*
>> *    fftw_free(data_out2);*
>>
>> *    //Generate test matrix for comparison*
>> *    arma::cx_mat fft_test_mat = local_mat;*
>> *    fft_test_mat.each_row([&](arma::cx_rowvec &a){*
>> *        a = arma::fft(a);*
>> *    });*
>> *    std::cout <<
>> "-----------------------------------------------------\n";*
>> *    std::cout << "Input matrix:\n" << local_mat << '\n';*
>> *    MatView(C, viewer);*
>> *    std::cout <<
>> "-----------------------------------------------------\n";*
>> *    std::cout << "Expected output matrix:\n" << fft_test_mat << '\n';*
>> *    MatView(F, viewer);*
>> *    std::cout <<
>> "-----------------------------------------------------\n";*
>> *    MatDestroy(&FFT_A);*
>> *    VecDestroy(&input);*
>> *    VecDestroy(&output);*
>> *    VecDestroy(&x);*
>> *    VecDestroy(&y);*
>> *    VecDestroy(&z);*
>> *    MatDestroy(&C);*
>> *    MatDestroy(&F);*
>> *    PetscViewerDestroy(&viewer);*
>> *    PetscFinalize();*
>> *    return 0;*
>> *}*
>>
>> For *mpirun -n 1* I get the expected output (i.e. armadillo and
>> PETSc/FFTW return the same result), but for *mpirun -n x* with x > 1
>> every value which is not assigned to rank 0 is lost and set to zero
>> instead. Every value assigned to rank 0 is calculated correctly, as far as
>> I can see. Did I forget something here?
>>
> I do not understand why your FFTW calls use the WORLD communicator. Aren't
> they serial FFTs over the local rows?
>
>   THanks,
>
>      Matt
>
>> Thanks,
>>
>> Roland
>> Am 05.12.20 um 01:59 schrieb Barry Smith:
>>
>>
>>   Roland,
>>
>>     If you store your matrix as described in a parallel PETSc dense
>> matrix then you should be able to call
>>
>> fftw_plan_many_dft() directly on the value obtained with
>> MatDenseGetArray(). You just need to pass the arguments regarding column
>> major ordering appropriately. Probably identically to what you do with your
>> previous code.
>>
>>    Barry
>>
>>
>> On Dec 4, 2020, at 6:47 AM, Roland Richter <roland.richter at ntnu.no>
>> wrote:
>>
>> Ideally those FFTs could be handled in parallel, after they are not
>> depending on each other. Is that possible with MatFFT, or should I rather
>> use FFTW for that?
>>
>> Thanks,
>>
>> Roland
>> Am 04.12.20 um 13:19 schrieb Matthew Knepley:
>>
>> On Fri, Dec 4, 2020 at 5:32 AM Roland Richter <roland.richter at ntnu.no>
>> wrote:
>>
>>> Hei,
>>>
>>> I am currently working on a problem which requires a large amount of
>>> transformations of a field E(r, t) from time space to Fourier space E(r,
>>> w) and back. The field is described in a 2d-matrix, with the r-dimension
>>> along the columns and the t-dimension along the rows.
>>>
>>> For the transformation from time to frequency space and back I therefore
>>> have to apply a 1d-FFT operation over each row of my matrix. For my
>>> earlier attempts I used armadillo as matrix library and FFTW for doing
>>> the transformations. Here I could use fftw_plan_many_dft to do all FFTs
>>> at the same time. Unfortunately, armadillo does not support MPI, and
>>> therefore I had to switch to PETSc for larger matrices.
>>>
>>> Based on the examples (such as example 143) PETSc has a way of doing
>>> FFTs internally by creating an FFT object (using MatCreateFFT).
>>> Unfortunately, I can not see how I could use that object to conduct the
>>> operation described above without having to iterate over each row in my
>>> original matrix (i.e. doing it sequential, not in parallel).
>>>
>>> Ideally I could distribute the FFTs such over my nodes that each node
>>> takes several rows of the original matrix and applies the FFT to each of
>>> them. As example, for a matrix with a size of 4x4 and two nodes node 0
>>> would take row 0 and 1, while node 1 takes row 2 and 3, to avoid
>>> unnecessary memory transfer between the nodes while conducting the FFTs.
>>> Is that something PETSc can do, too?
>>>
>>
>> The way I understand our setup (I did not write it), we use plan_many_dft
>> to handle
>> multiple dof FFTs, but these would be interlaced. You want many FFTs for
>> non-interlaced
>> storage, which is not something we do right now. You could definitely
>> call FFTW directly
>> if you want.
>>
>> Second, above it seems like you just want serial FFTs. You can definitely
>> create a MatFFT
>> with PETSC_COMM_SELF, and apply it to each row in the local rows, or
>> create the plan
>> yourself for the stack of rows.
>>
>>    Thanks,
>>
>>      Matt
>>
>>
>>> Thanks!
>>>
>>> Regards,
>>>
>>> Roland
>>>
>>>
>>
>> --
>> What most experimenters take for granted before they begin their
>> experiments is infinitely more interesting than any results to which their
>> experiments lead.
>> -- Norbert Wiener
>>
>> https://www.cse.buffalo.edu/~knepley/
>> <http://www.cse.buffalo.edu/~knepley/>
>>
>>
>>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
> https://www.cse.buffalo.edu/~knepley/
> <http://www.cse.buffalo.edu/~knepley/>
>
>

-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20201208/3e007852/attachment-0001.html>


More information about the petsc-users mailing list