[petsc-users] Usage of parallel FFT for doing batch 1d-FFTs over the columns of a dense 2d-matrix
Roland Richter
roland.richter at ntnu.no
Tue Dec 8 08:42:30 CST 2020
I replaced the FFT-code with the following:
/ int FFT_rank = 1;//
// const int FFTW_size[] = {matrix_size};//
// int howmany = last_owned_row_index - first_owned_row_index;//
// int idist = 1;//
// int odist = 1;//
// int istride = matrix_size / size;//
// int ostride = matrix_size / size;//
// const int inembed[] = {matrix_size}, onembed[] = {matrix_size};//
// fplan = fftw_plan_many_dft(FFT_rank, FFTW_size,//
// howmany,//
// data_in, inembed, istride, idist,//
// data_out, onembed, ostride, odist,//
// FFTW_FORWARD, FFTW_ESTIMATE);//
// bplan = fftw_plan_many_dft(FFT_rank, FFTW_size,//
// howmany,//
// data_out, inembed, istride, idist,//
// data_out2, onembed, ostride, odist,//
// FFTW_BACKWARD, FFTW_ESTIMATE);/
Now I get the expected results also for *mpirun -n x* with x > 1, but
only if my matrix size is an integer multiple of x, else some parts of
the resulting matrix are zeroed out. Therefore, I assume I made a
mistake here with inembed and onembed, but I am not sure how they
influence the result. Do you have any suggestions?
Thanks,
Roland
Am 08.12.20 um 14:55 schrieb Matthew Knepley:
> On Tue, Dec 8, 2020 at 8:40 AM Roland Richter <roland.richter at ntnu.no
> <mailto:roland.richter at ntnu.no>> wrote:
>
> Dear all,
>
> I tried the following code:
>
> /int main(int argc, char **args) {//
> // Mat C, F;//
> // Vec x, y, z;//
> // PetscViewer viewer;//
> // PetscMPIInt rank, size;//
> // PetscInitialize(&argc, &args, (char*) 0, help);//
> //
> // MPI_Comm_size(PETSC_COMM_WORLD, &size);//
> // MPI_Comm_rank(PETSC_COMM_WORLD, &rank);//
> //
> // PetscPrintf(PETSC_COMM_WORLD,"Number of processors = %d,
> rank = %d\n", size, rank);//
> // // std::cout << "From rank " << rank << '\n';//
> //
> // //MatCreate(PETSC_COMM_WORLD, &C);//
> // PetscViewerCreate(PETSC_COMM_WORLD, &viewer);//
> // PetscViewerSetType(viewer, PETSCVIEWERASCII);//
> // arma::cx_mat local_mat, local_zero_mat;//
> // const size_t matrix_size = 5;//
> //
> // MatCreateDense(PETSC_COMM_WORLD, PETSC_DECIDE, PETSC_DECIDE,
> matrix_size, matrix_size, NULL, &C);//
> // MatCreateDense(PETSC_COMM_WORLD, PETSC_DECIDE, PETSC_DECIDE,
> matrix_size, matrix_size, NULL, &F);//
> // if(rank == 0) {//
> // arma::Col<int> indices =
> arma::linspace<arma::Col<int>>(0, matrix_size - 1, matrix_size);//
> // //if(rank == 0) {//
> // local_mat = arma::randu<arma::cx_mat>(matrix_size,
> matrix_size);//
> // local_zero_mat = arma::zeros<arma::cx_mat>(matrix_size,
> matrix_size);//
> // arma::cx_mat tmp_mat = local_mat.st
> <http://local_mat.st>();//
> // MatSetValues(C, matrix_size, indices.memptr(),
> matrix_size, indices.memptr(), tmp_mat.memptr(), INSERT_VALUES);//
> // MatSetValues(F, matrix_size, indices.memptr(),
> matrix_size, indices.memptr(), local_zero_mat.memptr(),
> INSERT_VALUES);//
> // }//
> //
> // MatAssemblyBegin(C, MAT_FINAL_ASSEMBLY);//
> // MatAssemblyEnd(C, MAT_FINAL_ASSEMBLY);//
> // MatAssemblyBegin(F, MAT_FINAL_ASSEMBLY);//
> // MatAssemblyEnd(F, MAT_FINAL_ASSEMBLY);//
> //
> // //FFT test//
> // Mat FFT_A;//
> // Vec input, output;//
> // int first_owned_row_index = 0, last_owned_row_index = 0;//
> // const int FFT_length[] = {matrix_size};//
> //
> //
> // MatCreateFFT(PETSC_COMM_WORLD, 1, FFT_length, MATFFTW,
> &FFT_A);//
> // MatCreateVecsFFTW(FFT_A, &x, &y, &z);//
> // VecCreate(PETSC_COMM_WORLD, &input);//
> // VecSetFromOptions(input);//
> // VecSetSizes(input, PETSC_DECIDE, matrix_size);//
> // VecCreate(PETSC_COMM_WORLD, &output);//
> // VecSetFromOptions(output);//
> // VecSetSizes(output, PETSC_DECIDE, matrix_size);//
> // MatGetOwnershipRange(C, &first_owned_row_index,
> &last_owned_row_index);//
> // std::cout << "Rank " << rank << " owns row " <<
> first_owned_row_index << " to row " << last_owned_row_index << '\n';//
> //
> // //Testing FFT//
> //
> // /*---------------------------------------------------------*///
> // fftw_plan fplan,bplan;//
> // fftw_complex *data_in,*data_out,*data_out2;//
> // ptrdiff_t alloc_local, local_ni, local_i_start,
> local_n0,local_0_start;//
> // PetscRandom rdm;//
> //
> // // if (!rank)//
> // // printf("Use FFTW without PETSc-FFTW interface\n");//
> // fftw_mpi_init();//
> // int N = matrix_size * matrix_size;//
> // int N0 = matrix_size;//
> // int N1 = matrix_size;//
> // const ptrdiff_t n_data[] = {N0, 1};//
> // //alloc_local =
> fftw_mpi_local_size_2d(N0,N1,PETSC_COMM_WORLD,&local_n0,&local_0_start);//
> // alloc_local = fftw_mpi_local_size_many(1, n_data,//
> // matrix_size,//
> // FFTW_MPI_DEFAULT_BLOCK,//
> // PETSC_COMM_WORLD,//
> // &local_n0,//
> // &local_0_start);//
> // //data_in =
> (fftw_complex*)fftw_malloc(sizeof(fftw_complex)*alloc_local);//
> // PetscScalar *C_ptr, *F_ptr;//
> // MatDenseGetArray(C, &C_ptr);//
> // MatDenseGetArray(F, &F_ptr);//
> // data_in = reinterpret_cast<fftw_complex*>(C_ptr);//
> // data_out = reinterpret_cast<fftw_complex*>(F_ptr);//
> // data_out2 =
> (fftw_complex*)fftw_malloc(sizeof(fftw_complex)*alloc_local);//
> //
> //
> // VecCreateMPIWithArray(PETSC_COMM_WORLD,1,(PetscInt)local_n0
> * N1,(PetscInt)N,(const PetscScalar*)data_in,&x);//
> // PetscObjectSetName((PetscObject) x, "Real Space vector");//
> // VecCreateMPIWithArray(PETSC_COMM_WORLD,1,(PetscInt)local_n0
> * N1,(PetscInt)N,(const PetscScalar*)data_out,&y);//
> // PetscObjectSetName((PetscObject) y, "Frequency space vector");//
> // VecCreateMPIWithArray(PETSC_COMM_WORLD,1,(PetscInt)local_n0
> * N1,(PetscInt)N,(const PetscScalar*)data_out2,&z);//
> // PetscObjectSetName((PetscObject) z, "Reconstructed vector");//
> //
> // int FFT_rank = 1;//
> // const ptrdiff_t FFTW_size[] = {matrix_size};//
> // int howmany = last_owned_row_index - first_owned_row_index;//
> // //std::cout << "Rank " << rank << " processes " << howmany
> << " rows\n";//
> // int idist = matrix_size;//1;//
> // int odist = matrix_size;//1;//
> // int istride = 1;//matrix_size;//
> // int ostride = 1;//matrix_size;//
> // const ptrdiff_t *inembed = FFTW_size, *onembed = FFTW_size;//
> // fplan = fftw_mpi_plan_many_dft(FFT_rank, FFTW_size,//
> // howmany,//
> // FFTW_MPI_DEFAULT_BLOCK,
> FFTW_MPI_DEFAULT_BLOCK,//
> // data_in, data_out,//
> // PETSC_COMM_WORLD,//
> // FFTW_FORWARD, FFTW_ESTIMATE);//
> // bplan = fftw_mpi_plan_many_dft(FFT_rank, FFTW_size,//
> // howmany,//
> // FFTW_MPI_DEFAULT_BLOCK,
> FFTW_MPI_DEFAULT_BLOCK,//
> // data_out, data_out2,//
> // PETSC_COMM_WORLD,//
> // FFTW_BACKWARD, FFTW_ESTIMATE);//
> //
> // if (false) {VecView(x,PETSC_VIEWER_STDOUT_WORLD);}//
> //
> // fftw_execute(fplan);//
> // if (false) {VecView(y,PETSC_VIEWER_STDOUT_WORLD);}//
> //
> // fftw_execute(bplan);//
> //
> // double a = 1.0 / matrix_size;//
> // double enorm = 0;//
> // VecScale(z,a);//
> // if (false) {VecView(z, PETSC_VIEWER_STDOUT_WORLD);}//
> // VecAXPY(z,-1.0,x);//
> // VecNorm(z,NORM_1,&enorm);//
> // if (enorm > 1.e-11) {//
> // PetscPrintf(PETSC_COMM_SELF," Error norm of |x - z|
> %g\n",(double)enorm);//
> // }//
> //
> // /* Free spaces *///
> // fftw_destroy_plan(fplan);//
> // fftw_destroy_plan(bplan);//
> // fftw_free(data_out2);//
> //
> // //Generate test matrix for comparison//
> // arma::cx_mat fft_test_mat = local_mat;//
> // fft_test_mat.each_row([&](arma::cx_rowvec &a){//
> // a = arma::fft(a);//
> // });//
> // std::cout <<
> "-----------------------------------------------------\n";//
> // std::cout << "Input matrix:\n" << local_mat << '\n';//
> // MatView(C, viewer);//
> // std::cout <<
> "-----------------------------------------------------\n";//
> // std::cout << "Expected output matrix:\n" << fft_test_mat <<
> '\n';//
> // MatView(F, viewer);//
> // std::cout <<
> "-----------------------------------------------------\n";//
> // MatDestroy(&FFT_A);//
> // VecDestroy(&input);//
> // VecDestroy(&output);//
> // VecDestroy(&x);//
> // VecDestroy(&y);//
> // VecDestroy(&z);//
> // MatDestroy(&C);//
> // MatDestroy(&F);//
> // PetscViewerDestroy(&viewer);//
> // PetscFinalize();//
> // return 0;//
> //}/
>
> For *mpirun -n 1* I get the expected output (i.e. armadillo and
> PETSc/FFTW return the same result), but for *mpirun -n x* with x >
> 1 every value which is not assigned to rank 0 is lost and set to
> zero instead. Every value assigned to rank 0 is calculated
> correctly, as far as I can see. Did I forget something here?
>
> I do not understand why your FFTW calls use the WORLD communicator.
> Aren't they serial FFTs over the local rows?
>
> THanks,
>
> Matt
>
> Thanks,
>
> Roland
>
> Am 05.12.20 um 01:59 schrieb Barry Smith:
>>
>> Roland,
>>
>> If you store your matrix as described in a parallel PETSc
>> dense matrix then you should be able to call
>>
>> fftw_plan_many_dft() directly on the value obtained with
>> MatDenseGetArray(). You just need to pass the arguments regarding
>> column major ordering appropriately. Probably identically to what
>> you do with your previous code.
>>
>> Barry
>>
>>
>>> On Dec 4, 2020, at 6:47 AM, Roland Richter
>>> <roland.richter at ntnu.no <mailto:roland.richter at ntnu.no>> wrote:
>>>
>>> Ideally those FFTs could be handled in parallel, after they are
>>> not depending on each other. Is that possible with MatFFT, or
>>> should I rather use FFTW for that?
>>>
>>> Thanks,
>>>
>>> Roland
>>>
>>> Am 04.12.20 um 13:19 schrieb Matthew Knepley:
>>>> On Fri, Dec 4, 2020 at 5:32 AM Roland Richter
>>>> <roland.richter at ntnu.no <mailto:roland.richter at ntnu.no>> wrote:
>>>>
>>>> Hei,
>>>>
>>>> I am currently working on a problem which requires a large
>>>> amount of
>>>> transformations of a field E(r, t) from time space to
>>>> Fourier space E(r,
>>>> w) and back. The field is described in a 2d-matrix, with
>>>> the r-dimension
>>>> along the columns and the t-dimension along the rows.
>>>>
>>>> For the transformation from time to frequency space and
>>>> back I therefore
>>>> have to apply a 1d-FFT operation over each row of my
>>>> matrix. For my
>>>> earlier attempts I used armadillo as matrix library and
>>>> FFTW for doing
>>>> the transformations. Here I could use fftw_plan_many_dft to
>>>> do all FFTs
>>>> at the same time. Unfortunately, armadillo does not support
>>>> MPI, and
>>>> therefore I had to switch to PETSc for larger matrices.
>>>>
>>>> Based on the examples (such as example 143) PETSc has a way
>>>> of doing
>>>> FFTs internally by creating an FFT object (using MatCreateFFT).
>>>> Unfortunately, I can not see how I could use that object to
>>>> conduct the
>>>> operation described above without having to iterate over
>>>> each row in my
>>>> original matrix (i.e. doing it sequential, not in parallel).
>>>>
>>>> Ideally I could distribute the FFTs such over my nodes that
>>>> each node
>>>> takes several rows of the original matrix and applies the
>>>> FFT to each of
>>>> them. As example, for a matrix with a size of 4x4 and two
>>>> nodes node 0
>>>> would take row 0 and 1, while node 1 takes row 2 and 3, to
>>>> avoid
>>>> unnecessary memory transfer between the nodes while
>>>> conducting the FFTs.
>>>> Is that something PETSc can do, too?
>>>>
>>>>
>>>> The way I understand our setup (I did not write it), we use
>>>> plan_many_dft to handle
>>>> multiple dof FFTs, but these would be interlaced. You want many
>>>> FFTs for non-interlaced
>>>> storage, which is not something we do right now. You could
>>>> definitely call FFTW directly
>>>> if you want.
>>>>
>>>> Second, above it seems like you just want serial FFTs. You can
>>>> definitely create a MatFFT
>>>> with PETSC_COMM_SELF, and apply it to each row in the local
>>>> rows, or create the plan
>>>> yourself for the stack of rows.
>>>>
>>>> Thanks,
>>>>
>>>> Matt
>>>>
>>>>
>>>> Thanks!
>>>>
>>>> Regards,
>>>>
>>>> Roland
>>>>
>>>>
>>>>
>>>> --
>>>> What most experimenters take for granted before they begin
>>>> their experiments is infinitely more interesting than any
>>>> results to which their experiments lead.
>>>> -- Norbert Wiener
>>>>
>>>> https://www.cse.buffalo.edu/~knepley/
>>>> <http://www.cse.buffalo.edu/~knepley/>
>>
>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which
> their experiments lead.
> -- Norbert Wiener
>
> https://www.cse.buffalo.edu/~knepley/
> <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20201208/779b8bc7/attachment-0001.html>
More information about the petsc-users
mailing list