[petsc-users] Usage of parallel FFT for doing batch 1d-FFTs over the columns of a dense 2d-matrix

Roland Richter roland.richter at ntnu.no
Tue Dec 8 07:39:48 CST 2020


Dear all,

I tried the following code:

/int main(int argc, char **args) {//
//    Mat C, F;//
//    Vec x, y, z;//
//    PetscViewer viewer;//
//    PetscMPIInt rank, size;//
//    PetscInitialize(&argc, &args, (char*) 0, help);//
//
//    MPI_Comm_size(PETSC_COMM_WORLD, &size);//
//    MPI_Comm_rank(PETSC_COMM_WORLD, &rank);//
//
//    PetscPrintf(PETSC_COMM_WORLD,"Number of processors = %d, rank =
%d\n", size, rank);//
//    //    std::cout << "From rank " << rank << '\n';//
//
//    //MatCreate(PETSC_COMM_WORLD, &C);//
//    PetscViewerCreate(PETSC_COMM_WORLD, &viewer);//
//    PetscViewerSetType(viewer, PETSCVIEWERASCII);//
//    arma::cx_mat local_mat, local_zero_mat;//
//    const size_t matrix_size = 5;//
//
//    MatCreateDense(PETSC_COMM_WORLD, PETSC_DECIDE, PETSC_DECIDE,
matrix_size, matrix_size, NULL, &C);//
//    MatCreateDense(PETSC_COMM_WORLD, PETSC_DECIDE, PETSC_DECIDE,
matrix_size, matrix_size, NULL, &F);//
//    if(rank == 0) {//
//        arma::Col<int> indices = arma::linspace<arma::Col<int>>(0,
matrix_size - 1, matrix_size);//
//        //if(rank == 0) {//
//        local_mat = arma::randu<arma::cx_mat>(matrix_size, matrix_size);//
//        local_zero_mat = arma::zeros<arma::cx_mat>(matrix_size,
matrix_size);//
//        arma::cx_mat tmp_mat = local_mat.st();//
//        MatSetValues(C, matrix_size, indices.memptr(), matrix_size,
indices.memptr(), tmp_mat.memptr(), INSERT_VALUES);//
//        MatSetValues(F, matrix_size, indices.memptr(), matrix_size,
indices.memptr(), local_zero_mat.memptr(), INSERT_VALUES);//
//    }//
//
//    MatAssemblyBegin(C, MAT_FINAL_ASSEMBLY);//
//    MatAssemblyEnd(C, MAT_FINAL_ASSEMBLY);//
//    MatAssemblyBegin(F, MAT_FINAL_ASSEMBLY);//
//    MatAssemblyEnd(F, MAT_FINAL_ASSEMBLY);//
//
//    //FFT test//
//    Mat FFT_A;//
//    Vec input, output;//
//    int first_owned_row_index = 0, last_owned_row_index = 0;//
//    const int FFT_length[] = {matrix_size};//
//
//
//    MatCreateFFT(PETSC_COMM_WORLD, 1, FFT_length, MATFFTW, &FFT_A);//
//    MatCreateVecsFFTW(FFT_A, &x, &y, &z);//
//    VecCreate(PETSC_COMM_WORLD, &input);//
//    VecSetFromOptions(input);//
//    VecSetSizes(input, PETSC_DECIDE, matrix_size);//
//    VecCreate(PETSC_COMM_WORLD, &output);//
//    VecSetFromOptions(output);//
//    VecSetSizes(output, PETSC_DECIDE, matrix_size);//
//    MatGetOwnershipRange(C, &first_owned_row_index,
&last_owned_row_index);//
//    std::cout << "Rank " << rank << " owns row " <<
first_owned_row_index << " to row " << last_owned_row_index << '\n';//
//
//    //Testing FFT//
//
//    /*---------------------------------------------------------*///
//    fftw_plan    fplan,bplan;//
//    fftw_complex *data_in,*data_out,*data_out2;//
//    ptrdiff_t    alloc_local, local_ni, local_i_start,
local_n0,local_0_start;//
//    PetscRandom rdm;//
//
//    //    if (!rank)//
//    //        printf("Use FFTW without PETSc-FFTW interface\n");//
//    fftw_mpi_init();//
//    int N           = matrix_size * matrix_size;//
//    int N0 = matrix_size;//
//    int N1 = matrix_size;//
//    const ptrdiff_t n_data[] = {N0, 1};//
//    //alloc_local =
fftw_mpi_local_size_2d(N0,N1,PETSC_COMM_WORLD,&local_n0,&local_0_start);//
//    alloc_local = fftw_mpi_local_size_many(1, n_data,//
//                                           matrix_size,//
//                                           FFTW_MPI_DEFAULT_BLOCK,//
//                                           PETSC_COMM_WORLD,//
//                                           &local_n0,//
//                                           &local_0_start);//
//    //data_in   =
(fftw_complex*)fftw_malloc(sizeof(fftw_complex)*alloc_local);//
//    PetscScalar *C_ptr, *F_ptr;//
//    MatDenseGetArray(C, &C_ptr);//
//    MatDenseGetArray(F, &F_ptr);//
//    data_in = reinterpret_cast<fftw_complex*>(C_ptr);//
//    data_out = reinterpret_cast<fftw_complex*>(F_ptr);//
//    data_out2 =
(fftw_complex*)fftw_malloc(sizeof(fftw_complex)*alloc_local);//
//
//
//    VecCreateMPIWithArray(PETSC_COMM_WORLD,1,(PetscInt)local_n0 *
N1,(PetscInt)N,(const PetscScalar*)data_in,&x);//
//    PetscObjectSetName((PetscObject) x, "Real Space vector");//
//    VecCreateMPIWithArray(PETSC_COMM_WORLD,1,(PetscInt)local_n0 *
N1,(PetscInt)N,(const PetscScalar*)data_out,&y);//
//    PetscObjectSetName((PetscObject) y, "Frequency space vector");//
//    VecCreateMPIWithArray(PETSC_COMM_WORLD,1,(PetscInt)local_n0 *
N1,(PetscInt)N,(const PetscScalar*)data_out2,&z);//
//    PetscObjectSetName((PetscObject) z, "Reconstructed vector");//
//
//    int FFT_rank = 1;//
//    const ptrdiff_t FFTW_size[] = {matrix_size};//
//    int howmany = last_owned_row_index - first_owned_row_index;//
//    //std::cout << "Rank " << rank << " processes " << howmany << "
rows\n";//
//    int idist = matrix_size;//1;//
//    int odist = matrix_size;//1;//
//    int istride = 1;//matrix_size;//
//    int ostride = 1;//matrix_size;//
//    const ptrdiff_t *inembed = FFTW_size, *onembed = FFTW_size;//
//    fplan = fftw_mpi_plan_many_dft(FFT_rank, FFTW_size,//
//                                   howmany,//
//                                   FFTW_MPI_DEFAULT_BLOCK,
FFTW_MPI_DEFAULT_BLOCK,//
//                                   data_in, data_out,//
//                                   PETSC_COMM_WORLD,//
//                                   FFTW_FORWARD, FFTW_ESTIMATE);//
//    bplan = fftw_mpi_plan_many_dft(FFT_rank, FFTW_size,//
//                                   howmany,//
//                                   FFTW_MPI_DEFAULT_BLOCK,
FFTW_MPI_DEFAULT_BLOCK,//
//                                   data_out, data_out2,//
//                                   PETSC_COMM_WORLD,//
//                                   FFTW_BACKWARD, FFTW_ESTIMATE);//
//
//    if (false) {VecView(x,PETSC_VIEWER_STDOUT_WORLD);}//
//
//    fftw_execute(fplan);//
//    if (false) {VecView(y,PETSC_VIEWER_STDOUT_WORLD);}//
//
//    fftw_execute(bplan);//
//
//    double a = 1.0 / matrix_size;//
//    double enorm = 0;//
//    VecScale(z,a);//
//    if (false) {VecView(z, PETSC_VIEWER_STDOUT_WORLD);}//
//    VecAXPY(z,-1.0,x);//
//    VecNorm(z,NORM_1,&enorm);//
//    if (enorm > 1.e-11) {//
//        PetscPrintf(PETSC_COMM_SELF,"  Error norm of |x - z|
%g\n",(double)enorm);//
//    }//
//
//    /* Free spaces *///
//    fftw_destroy_plan(fplan);//
//    fftw_destroy_plan(bplan);//
//    fftw_free(data_out2);//
//
//    //Generate test matrix for comparison//
//    arma::cx_mat fft_test_mat = local_mat;//
//    fft_test_mat.each_row([&](arma::cx_rowvec &a){//
//        a = arma::fft(a);//
//    });//
//    std::cout <<
"-----------------------------------------------------\n";//
//    std::cout << "Input matrix:\n" << local_mat << '\n';//
//    MatView(C, viewer);//
//    std::cout <<
"-----------------------------------------------------\n";//
//    std::cout << "Expected output matrix:\n" << fft_test_mat << '\n';//
//    MatView(F, viewer);//
//    std::cout <<
"-----------------------------------------------------\n";//
//    MatDestroy(&FFT_A);//
//    VecDestroy(&input);//
//    VecDestroy(&output);//
//    VecDestroy(&x);//
//    VecDestroy(&y);//
//    VecDestroy(&z);//
//    MatDestroy(&C);//
//    MatDestroy(&F);//
//    PetscViewerDestroy(&viewer);//
//    PetscFinalize();//
//    return 0;//
//}/

For *mpirun -n 1* I get the expected output (i.e. armadillo and
PETSc/FFTW return the same result), but for *mpirun -n x* with x > 1
every value which is not assigned to rank 0 is lost and set to zero
instead. Every value assigned to rank 0 is calculated correctly, as far
as I can see. Did I forget something here?

Thanks,

Roland

Am 05.12.20 um 01:59 schrieb Barry Smith:
>
>   Roland,
>
>     If you store your matrix as described in a parallel PETSc dense
> matrix then you should be able to call 
>
> fftw_plan_many_dft() directly on the value obtained with
> MatDenseGetArray(). You just need to pass the arguments regarding
> column major ordering appropriately. Probably identically to what you
> do with your previous code.
>
>    Barry
>
>
>> On Dec 4, 2020, at 6:47 AM, Roland Richter <roland.richter at ntnu.no
>> <mailto:roland.richter at ntnu.no>> wrote:
>>
>> Ideally those FFTs could be handled in parallel, after they are not
>> depending on each other. Is that possible with MatFFT, or should I
>> rather use FFTW for that?
>>
>> Thanks,
>>
>> Roland
>>
>> Am 04.12.20 um 13:19 schrieb Matthew Knepley:
>>> On Fri, Dec 4, 2020 at 5:32 AM Roland Richter
>>> <roland.richter at ntnu.no <mailto:roland.richter at ntnu.no>> wrote:
>>>
>>>     Hei,
>>>
>>>     I am currently working on a problem which requires a large amount of
>>>     transformations of a field E(r, t) from time space to Fourier
>>>     space E(r,
>>>     w) and back. The field is described in a 2d-matrix, with the
>>>     r-dimension
>>>     along the columns and the t-dimension along the rows.
>>>
>>>     For the transformation from time to frequency space and back I
>>>     therefore
>>>     have to apply a 1d-FFT operation over each row of my matrix. For my
>>>     earlier attempts I used armadillo as matrix library and FFTW for
>>>     doing
>>>     the transformations. Here I could use fftw_plan_many_dft to do
>>>     all FFTs
>>>     at the same time. Unfortunately, armadillo does not support MPI, and
>>>     therefore I had to switch to PETSc for larger matrices.
>>>
>>>     Based on the examples (such as example 143) PETSc has a way of doing
>>>     FFTs internally by creating an FFT object (using MatCreateFFT).
>>>     Unfortunately, I can not see how I could use that object to
>>>     conduct the
>>>     operation described above without having to iterate over each
>>>     row in my
>>>     original matrix (i.e. doing it sequential, not in parallel).
>>>
>>>     Ideally I could distribute the FFTs such over my nodes that each
>>>     node
>>>     takes several rows of the original matrix and applies the FFT to
>>>     each of
>>>     them. As example, for a matrix with a size of 4x4 and two nodes
>>>     node 0
>>>     would take row 0 and 1, while node 1 takes row 2 and 3, to avoid
>>>     unnecessary memory transfer between the nodes while conducting
>>>     the FFTs.
>>>     Is that something PETSc can do, too?
>>>
>>>
>>> The way I understand our setup (I did not write it), we use
>>> plan_many_dft to handle
>>> multiple dof FFTs, but these would be interlaced. You want many FFTs
>>> for non-interlaced
>>> storage, which is not something we do right now. You could
>>> definitely call FFTW directly
>>> if you want.
>>>
>>> Second, above it seems like you just want serial FFTs. You can
>>> definitely create a MatFFT
>>> with PETSC_COMM_SELF, and apply it to each row in the local rows, or
>>> create the plan
>>> yourself for the stack of rows.
>>>
>>>    Thanks,
>>>
>>>      Matt
>>>  
>>>
>>>     Thanks!
>>>
>>>     Regards,
>>>
>>>     Roland
>>>
>>>
>>>
>>> -- 
>>> What most experimenters take for granted before they begin their
>>> experiments is infinitely more interesting than any results to which
>>> their experiments lead.
>>> -- Norbert Wiener
>>>
>>> https://www.cse.buffalo.edu/~knepley/
>>> <http://www.cse.buffalo.edu/~knepley/>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20201208/7728c6b1/attachment.html>


More information about the petsc-users mailing list