<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body>
<p>I replaced the FFT-code with the following:</p>
<p><i> int FFT_rank = 1;</i><i><br>
</i><i> const int FFTW_size[] = {matrix_size};</i><i><br>
</i><i> int howmany = last_owned_row_index -
first_owned_row_index;</i><i><br>
</i><i> int idist = 1;</i><i><br>
</i><i> int odist = 1;</i><i><br>
</i><i> int istride = matrix_size / size;</i><i><br>
</i><i> int ostride = matrix_size / size;</i><i><br>
</i><i> const int inembed[] = {matrix_size}, onembed[] =
{matrix_size};</i><i><br>
</i><i> fplan = fftw_plan_many_dft(FFT_rank, FFTW_size,</i><i><br>
</i><i> howmany,</i><i><br>
</i><i> data_in, inembed, istride,
idist,</i><i><br>
</i><i> data_out, onembed, ostride,
odist,</i><i><br>
</i><i> FFTW_FORWARD,
FFTW_ESTIMATE);</i><i><br>
</i><i> bplan = fftw_plan_many_dft(FFT_rank, FFTW_size,</i><i><br>
</i><i> howmany,</i><i><br>
</i><i> data_out, inembed, istride,
idist,</i><i><br>
</i><i> data_out2, onembed, ostride,
odist,</i><i><br>
</i><i> FFTW_BACKWARD,
FFTW_ESTIMATE);</i></p>
<p>Now I get the expected results also for <b>mpirun -n x</b> with
x > 1, but only if my matrix size is an integer multiple of x,
else some parts of the resulting matrix are zeroed out. Therefore,
I assume I made a mistake here with inembed and onembed, but I am
not sure how they influence the result. Do you have any
suggestions?</p>
<p>Thanks,</p>
<p>Roland<br>
</p>
<div class="moz-cite-prefix">Am 08.12.20 um 14:55 schrieb Matthew
Knepley:<br>
</div>
<blockquote type="cite"
cite="mid:CAMYG4GnWsRAxknDTE9pL9TJLMFe10CHbosmP5iF1g8nt3Z4vDw@mail.gmail.com">
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<div dir="ltr">
<div dir="ltr">On Tue, Dec 8, 2020 at 8:40 AM Roland Richter
<<a href="mailto:roland.richter@ntnu.no"
moz-do-not-send="true">roland.richter@ntnu.no</a>> wrote:<br>
</div>
<div class="gmail_quote">
<blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
<div>
<p>Dear all,</p>
<p>I tried the following code:</p>
<p><i>int main(int argc, char **args) {</i><i><br>
</i><i> Mat C, F;</i><i><br>
</i><i> Vec x, y, z;</i><i><br>
</i><i> PetscViewer viewer;</i><i><br>
</i><i> PetscMPIInt rank, size;</i><i><br>
</i><i> PetscInitialize(&argc, &args, (char*)
0, help);</i><i><br>
</i><i><br>
</i><i> MPI_Comm_size(PETSC_COMM_WORLD, &size);</i><i><br>
</i><i> MPI_Comm_rank(PETSC_COMM_WORLD, &rank);</i><i><br>
</i><i><br>
</i><i> PetscPrintf(PETSC_COMM_WORLD,"Number of
processors = %d, rank = %d\n", size, rank);</i><i><br>
</i><i> // std::cout << "From rank "
<< rank << '\n';</i><i><br>
</i><i><br>
</i><i> //MatCreate(PETSC_COMM_WORLD, &C);</i><i><br>
</i><i> PetscViewerCreate(PETSC_COMM_WORLD,
&viewer);</i><i><br>
</i><i> PetscViewerSetType(viewer, PETSCVIEWERASCII);</i><i><br>
</i><i> arma::cx_mat local_mat, local_zero_mat;</i><i><br>
</i><i> const size_t matrix_size = 5;</i><i><br>
</i><i><br>
</i><i> MatCreateDense(PETSC_COMM_WORLD,
PETSC_DECIDE, PETSC_DECIDE, matrix_size, matrix_size,
NULL, &C);</i><i><br>
</i><i> MatCreateDense(PETSC_COMM_WORLD,
PETSC_DECIDE, PETSC_DECIDE, matrix_size, matrix_size,
NULL, &F);</i><i><br>
</i><i> if(rank == 0) {</i><i><br>
</i><i> arma::Col<int> indices =
arma::linspace<arma::Col<int>>(0,
matrix_size - 1, matrix_size);</i><i><br>
</i><i> //if(rank == 0) {</i><i><br>
</i><i> local_mat =
arma::randu<arma::cx_mat>(matrix_size,
matrix_size);</i><i><br>
</i><i> local_zero_mat =
arma::zeros<arma::cx_mat>(matrix_size,
matrix_size);</i><i><br>
</i><i> arma::cx_mat tmp_mat = <a
href="http://local_mat.st" target="_blank"
moz-do-not-send="true">local_mat.st</a>();</i><i><br>
</i><i> MatSetValues(C, matrix_size,
indices.memptr(), matrix_size, indices.memptr(),
tmp_mat.memptr(), INSERT_VALUES);</i><i><br>
</i><i> MatSetValues(F, matrix_size,
indices.memptr(), matrix_size, indices.memptr(),
local_zero_mat.memptr(), INSERT_VALUES);</i><i><br>
</i><i> }</i><i><br>
</i><i><br>
</i><i> MatAssemblyBegin(C, MAT_FINAL_ASSEMBLY);</i><i><br>
</i><i> MatAssemblyEnd(C, MAT_FINAL_ASSEMBLY);</i><i><br>
</i><i> MatAssemblyBegin(F, MAT_FINAL_ASSEMBLY);</i><i><br>
</i><i> MatAssemblyEnd(F, MAT_FINAL_ASSEMBLY);</i><i><br>
</i><i><br>
</i><i> //FFT test</i><i><br>
</i><i> Mat FFT_A;</i><i><br>
</i><i> Vec input, output;</i><i><br>
</i><i> int first_owned_row_index = 0,
last_owned_row_index = 0;</i><i><br>
</i><i> const int FFT_length[] = {matrix_size};</i><i><br>
</i><i><br>
</i><i><br>
</i><i> MatCreateFFT(PETSC_COMM_WORLD, 1, FFT_length,
MATFFTW, &FFT_A);</i><i><br>
</i><i> MatCreateVecsFFTW(FFT_A, &x, &y,
&z);</i><i><br>
</i><i> VecCreate(PETSC_COMM_WORLD, &input);</i><i><br>
</i><i> VecSetFromOptions(input);</i><i><br>
</i><i> VecSetSizes(input, PETSC_DECIDE,
matrix_size);</i><i><br>
</i><i> VecCreate(PETSC_COMM_WORLD, &output);</i><i><br>
</i><i> VecSetFromOptions(output);</i><i><br>
</i><i> VecSetSizes(output, PETSC_DECIDE,
matrix_size);</i><i><br>
</i><i> MatGetOwnershipRange(C,
&first_owned_row_index,
&last_owned_row_index);</i><i><br>
</i><i> std::cout << "Rank " << rank
<< " owns row " << first_owned_row_index
<< " to row " << last_owned_row_index
<< '\n';</i><i><br>
</i><i><br>
</i><i> //Testing FFT</i><i><br>
</i><i><br>
</i><i>
/*---------------------------------------------------------*/</i><i><br>
</i><i> fftw_plan fplan,bplan;</i><i><br>
</i><i> fftw_complex *data_in,*data_out,*data_out2;</i><i><br>
</i><i> ptrdiff_t alloc_local, local_ni,
local_i_start, local_n0,local_0_start;</i><i><br>
</i><i> PetscRandom rdm;</i><i><br>
</i><i><br>
</i><i> // if (!rank)</i><i><br>
</i><i> // printf("Use FFTW without PETSc-FFTW
interface\n");</i><i><br>
</i><i> fftw_mpi_init();</i><i><br>
</i><i> int N = matrix_size * matrix_size;</i><i><br>
</i><i> int N0 = matrix_size;</i><i><br>
</i><i> int N1 = matrix_size;</i><i><br>
</i><i> const ptrdiff_t n_data[] = {N0, 1};</i><i><br>
</i><i> //alloc_local =
fftw_mpi_local_size_2d(N0,N1,PETSC_COMM_WORLD,&local_n0,&local_0_start);</i><i><br>
</i><i> alloc_local = fftw_mpi_local_size_many(1,
n_data,</i><i><br>
</i><i>
matrix_size,</i><i><br>
</i><i>
FFTW_MPI_DEFAULT_BLOCK,</i><i><br>
</i><i>
PETSC_COMM_WORLD,</i><i><br>
</i><i>
&local_n0,</i><i><br>
</i><i>
&local_0_start);</i><i><br>
</i><i> //data_in =
(fftw_complex*)fftw_malloc(sizeof(fftw_complex)*alloc_local);</i><i><br>
</i><i> PetscScalar *C_ptr, *F_ptr;</i><i><br>
</i><i> MatDenseGetArray(C, &C_ptr);</i><i><br>
</i><i> MatDenseGetArray(F, &F_ptr);</i><i><br>
</i><i> data_in =
reinterpret_cast<fftw_complex*>(C_ptr);</i><i><br>
</i><i> data_out =
reinterpret_cast<fftw_complex*>(F_ptr);</i><i><br>
</i><i> data_out2 =
(fftw_complex*)fftw_malloc(sizeof(fftw_complex)*alloc_local);</i><i><br>
</i><i><br>
</i><i><br>
</i><i>
VecCreateMPIWithArray(PETSC_COMM_WORLD,1,(PetscInt)local_n0
* N1,(PetscInt)N,(const PetscScalar*)data_in,&x);</i><i><br>
</i><i> PetscObjectSetName((PetscObject) x, "Real
Space vector");</i><i><br>
</i><i>
VecCreateMPIWithArray(PETSC_COMM_WORLD,1,(PetscInt)local_n0
* N1,(PetscInt)N,(const PetscScalar*)data_out,&y);</i><i><br>
</i><i> PetscObjectSetName((PetscObject) y,
"Frequency space vector");</i><i><br>
</i><i>
VecCreateMPIWithArray(PETSC_COMM_WORLD,1,(PetscInt)local_n0
* N1,(PetscInt)N,(const
PetscScalar*)data_out2,&z);</i><i><br>
</i><i> PetscObjectSetName((PetscObject) z,
"Reconstructed vector");</i><i><br>
</i><i><br>
</i><i> int FFT_rank = 1;</i><i><br>
</i><i> const ptrdiff_t FFTW_size[] = {matrix_size};</i><i><br>
</i><i> int howmany = last_owned_row_index -
first_owned_row_index;</i><i><br>
</i><i> //std::cout << "Rank " << rank
<< " processes " << howmany << "
rows\n";</i><i><br>
</i><i> int idist = matrix_size;//1;</i><i><br>
</i><i> int odist = matrix_size;//1;</i><i><br>
</i><i> int istride = 1;//matrix_size;</i><i><br>
</i><i> int ostride = 1;//matrix_size;</i><i><br>
</i><i> const ptrdiff_t *inembed = FFTW_size,
*onembed = FFTW_size;</i><i><br>
</i><i> fplan = fftw_mpi_plan_many_dft(FFT_rank,
FFTW_size,</i><i><br>
</i><i> howmany,</i><i><br>
</i><i>
FFTW_MPI_DEFAULT_BLOCK, FFTW_MPI_DEFAULT_BLOCK,</i><i><br>
</i><i> data_in,
data_out,</i><i><br>
</i><i>
PETSC_COMM_WORLD,</i><i><br>
</i><i> FFTW_FORWARD,
FFTW_ESTIMATE);</i><i><br>
</i><i> bplan = fftw_mpi_plan_many_dft(FFT_rank,
FFTW_size,</i><i><br>
</i><i> howmany,</i><i><br>
</i><i>
FFTW_MPI_DEFAULT_BLOCK, FFTW_MPI_DEFAULT_BLOCK,</i><i><br>
</i><i> data_out,
data_out2,</i><i><br>
</i><i>
PETSC_COMM_WORLD,</i><i><br>
</i><i> FFTW_BACKWARD,
FFTW_ESTIMATE);</i><i><br>
</i><i><br>
</i><i> if (false)
{VecView(x,PETSC_VIEWER_STDOUT_WORLD);}</i><i><br>
</i><i><br>
</i><i> fftw_execute(fplan);</i><i><br>
</i><i> if (false)
{VecView(y,PETSC_VIEWER_STDOUT_WORLD);}</i><i><br>
</i><i><br>
</i><i> fftw_execute(bplan);</i><i><br>
</i><i><br>
</i><i> double a = 1.0 / matrix_size;</i><i><br>
</i><i> double enorm = 0;</i><i><br>
</i><i> VecScale(z,a);</i><i><br>
</i><i> if (false) {VecView(z,
PETSC_VIEWER_STDOUT_WORLD);}</i><i><br>
</i><i> VecAXPY(z,-1.0,x);</i><i><br>
</i><i> VecNorm(z,NORM_1,&enorm);</i><i><br>
</i><i> if (enorm > 1.e-11) {</i><i><br>
</i><i> PetscPrintf(PETSC_COMM_SELF," Error norm
of |x - z| %g\n",(double)enorm);</i><i><br>
</i><i> }</i><i><br>
</i><i><br>
</i><i> /* Free spaces */</i><i><br>
</i><i> fftw_destroy_plan(fplan);</i><i><br>
</i><i> fftw_destroy_plan(bplan);</i><i><br>
</i><i> fftw_free(data_out2);</i><i><br>
</i><i><br>
</i><i> //Generate test matrix for comparison</i><i><br>
</i><i> arma::cx_mat fft_test_mat = local_mat;</i><i><br>
</i><i> fft_test_mat.each_row([&](arma::cx_rowvec
&a){</i><i><br>
</i><i> a = arma::fft(a);</i><i><br>
</i><i> });</i><i><br>
</i><i> std::cout <<
"-----------------------------------------------------\n";</i><i><br>
</i><i> std::cout << "Input matrix:\n" <<
local_mat << '\n';</i><i><br>
</i><i> MatView(C, viewer);</i><i><br>
</i><i> std::cout <<
"-----------------------------------------------------\n";</i><i><br>
</i><i> std::cout << "Expected output
matrix:\n" << fft_test_mat << '\n';</i><i><br>
</i><i> MatView(F, viewer);</i><i><br>
</i><i> std::cout <<
"-----------------------------------------------------\n";</i><i><br>
</i><i> MatDestroy(&FFT_A);</i><i><br>
</i><i> VecDestroy(&input);</i><i><br>
</i><i> VecDestroy(&output);</i><i><br>
</i><i> VecDestroy(&x);</i><i><br>
</i><i> VecDestroy(&y);</i><i><br>
</i><i> VecDestroy(&z);</i><i><br>
</i><i> MatDestroy(&C);</i><i><br>
</i><i> MatDestroy(&F);</i><i><br>
</i><i> PetscViewerDestroy(&viewer);</i><i><br>
</i><i> PetscFinalize();</i><i><br>
</i><i> return 0;</i><i><br>
</i><i>}</i></p>
<p>For <b>mpirun -n 1</b> I get the expected output (i.e.
armadillo and PETSc/FFTW return the same result), but
for <b>mpirun -n x</b> with x > 1 every value which
is not assigned to rank 0 is lost and set to zero
instead. Every value assigned to rank 0 is calculated
correctly, as far as I can see. Did I forget something
here?</p>
</div>
</blockquote>
<div>I do not understand why your FFTW calls use the WORLD
communicator. Aren't they serial FFTs over the local rows?</div>
<div><br>
</div>
<div> THanks,</div>
<div><br>
</div>
<div> Matt</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
<div>
<p>Thanks,</p>
<p>Roland<br>
</p>
<div>Am 05.12.20 um 01:59 schrieb Barry Smith:<br>
</div>
<blockquote type="cite">
<div><br>
</div>
Roland,
<div><br>
</div>
<div> If you store your matrix as described in a
parallel PETSc dense matrix then you should be able to
call </div>
<div><br>
</div>
<div>fftw_plan_many_dft() directly on the value obtained
with MatDenseGetArray(). You just need to pass the
arguments regarding column major ordering
appropriately. Probably identically to what you do
with your previous code.</div>
<div><br>
</div>
<div> Barry</div>
<div><br>
<div><br>
<blockquote type="cite">
<div>On Dec 4, 2020, at 6:47 AM, Roland Richter
<<a href="mailto:roland.richter@ntnu.no"
target="_blank" moz-do-not-send="true">roland.richter@ntnu.no</a>>
wrote:</div>
<br>
<div>
<div>
<p>Ideally those FFTs could be handled in
parallel, after they are not depending on
each other. Is that possible with MatFFT, or
should I rather use FFTW for that?</p>
<p>Thanks,</p>
<p>Roland<br>
</p>
<div>Am 04.12.20 um 13:19 schrieb Matthew
Knepley:<br>
</div>
<blockquote type="cite">
<div dir="ltr">
<div dir="ltr">On Fri, Dec 4, 2020 at 5:32
AM Roland Richter <<a
href="mailto:roland.richter@ntnu.no"
target="_blank" moz-do-not-send="true">roland.richter@ntnu.no</a>>
wrote:<br>
</div>
<div class="gmail_quote">
<blockquote class="gmail_quote"
style="margin:0px 0px 0px
0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">Hei,<br>
<br>
I am currently working on a problem
which requires a large amount of<br>
transformations of a field E(r, t)
from time space to Fourier space E(r,<br>
w) and back. The field is described in
a 2d-matrix, with the r-dimension<br>
along the columns and the t-dimension
along the rows.<br>
<br>
For the transformation from time to
frequency space and back I therefore<br>
have to apply a 1d-FFT operation over
each row of my matrix. For my<br>
earlier attempts I used armadillo as
matrix library and FFTW for doing<br>
the transformations. Here I could use
fftw_plan_many_dft to do all FFTs<br>
at the same time. Unfortunately,
armadillo does not support MPI, and<br>
therefore I had to switch to PETSc for
larger matrices.<br>
<br>
Based on the examples (such as example
143) PETSc has a way of doing<br>
FFTs internally by creating an FFT
object (using MatCreateFFT).<br>
Unfortunately, I can not see how I
could use that object to conduct the<br>
operation described above without
having to iterate over each row in my<br>
original matrix (i.e. doing it
sequential, not in parallel).<br>
<br>
Ideally I could distribute the FFTs
such over my nodes that each node<br>
takes several rows of the original
matrix and applies the FFT to each of<br>
them. As example, for a matrix with a
size of 4x4 and two nodes node 0<br>
would take row 0 and 1, while node 1
takes row 2 and 3, to avoid<br>
unnecessary memory transfer between
the nodes while conducting the FFTs.<br>
Is that something PETSc can do, too?<br>
</blockquote>
<div><br>
</div>
<div>The way I understand our setup (I
did not write it), we use
plan_many_dft to handle</div>
<div>multiple dof FFTs, but these would
be interlaced. You want many FFTs for
non-interlaced</div>
<div>storage, which is not something we
do right now. You could definitely
call FFTW directly</div>
<div>if you want.</div>
<div><br>
</div>
<div>Second, above it seems like you
just want serial FFTs. You can
definitely create a MatFFT</div>
<div>with PETSC_COMM_SELF, and apply it
to each row in the local rows, or
create the plan</div>
<div>yourself for the stack of rows.</div>
<div><br>
</div>
<div> Thanks,</div>
<div><br>
</div>
<div> Matt</div>
<div> </div>
<blockquote class="gmail_quote"
style="margin:0px 0px 0px
0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
Thanks!<br>
<br>
Regards,<br>
<br>
Roland<br>
<br>
</blockquote>
</div>
<br clear="all">
<div><br>
</div>
-- <br>
<div dir="ltr">
<div dir="ltr">
<div>
<div dir="ltr">
<div>
<div dir="ltr">
<div>What most experimenters
take for granted before they
begin their experiments is
infinitely more interesting
than any results to which
their experiments lead.<br>
-- Norbert Wiener</div>
<div><br>
</div>
<div><a
href="http://www.cse.buffalo.edu/~knepley/"
target="_blank"
moz-do-not-send="true">https://www.cse.buffalo.edu/~knepley/</a><br>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</blockquote>
</div>
</div>
</blockquote>
</div>
<br>
</div>
</blockquote>
</div>
</blockquote>
</div>
<br clear="all">
<div><br>
</div>
-- <br>
<div dir="ltr" class="gmail_signature">
<div dir="ltr">
<div>
<div dir="ltr">
<div>
<div dir="ltr">
<div>What most experimenters take for granted before
they begin their experiments is infinitely more
interesting than any results to which their
experiments lead.<br>
-- Norbert Wiener</div>
<div><br>
</div>
<div><a href="http://www.cse.buffalo.edu/~knepley/"
target="_blank" moz-do-not-send="true">https://www.cse.buffalo.edu/~knepley/</a><br>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</blockquote>
</body>
</html>