<html><body><div><div><div>I cannot declare everything as PetscScalar, my strategy is computing the elements of matrix on GPU blocks by blocks and copying them back to the CPU. Finally computing the eigenvalues using SLEPc on CPU.<br></div><div class="x-apple-signature" style="white-space: pre-wrap">--------------------
Langtian Liu
Institute for Theorectical Physics, Justus-Liebig-University Giessen
Heinrich-Buff-Ring 16, 35392 Giessen Germany
email: <a href="mailto:langtian.liu@icloud.com">langtian.liu@icloud.com</a>
Tel: (+49)641 99 33342<br></div></div><div><br></div><blockquote type="cite"><div>On Oct 2, 2024, at 11:31 AM, Jose E. Roman <jroman@dsic.upv.es> wrote:<br></div><div><br></div><div><br></div><div><div><div>Does it work if you declare everything as PetscScalar instead of cuDoubleComplex?<br></div><div><br></div><blockquote type="cite"><div>El 2 oct 2024, a las 11:23, 刘浪天 <langtian.liu@icloud.com> escribió:<br></div><div><br></div><div>Hi Jose,<br></div><div><br></div><div>Since my matrix is two large, I cannot create the Mat on GPU. So I still want to create and compute the eigenvalues of this matrix on CPU using SLEPc.<br></div><div><br></div><div>Best,<br></div><div>-------------------- Langtian Liu Institute for Theorectical Physics, Justus-Liebig-University Giessen Heinrich-Buff-Ring 16, 35392 Giessen Germany email: <a href="mailto:langtian.liu@icloud.com">langtian.liu@icloud.com</a> Tel: (+49)641 99 33342<br></div><div><br></div><blockquote type="cite"><div>On Oct 2, 2024, at 11:18 AM, Jose E. Roman <jroman@dsic.upv.es> wrote:<br></div><div><br></div><div><br></div><div>For the CUDA case you should use MatCreateDenseCUDA() instead of MatCreateDense(). With this you pass a pointer with the data on the GPU memory. But I guess "new cuDoubleComplex[dim*dim]" is allocating on the CPU, you should use cudaMalloc() instead.<br></div><div><br></div><div>Jose<br></div><div><br></div><div><br></div><blockquote type="cite"><div>El 2 oct 2024, a las 10:56, 刘浪天 via petsc-users <petsc-users@mcs.anl.gov> escribió:<br></div><div><br></div><div>Hi all,<br></div><div><br></div><div>I am using the PETSc and SLEPc to solve the Faddeev equation of baryons. I encounter a problem of function MatCreateDense when changing from CPU to CPU-GPU computations.<br></div><div>At first, I write the codes in purely CPU computation in the following way and it works.<br></div><div>```<br></div><div>Eigen::MatrixXcd H_KER;<br></div><div>Eigen::MatrixXcd G0;<br></div><div>printf("\nCompute the propagator matrix.\n");<br></div><div>prop_matrix_nucleon_sc_av(Mn, pp_nodes, cos1_nodes);<br></div><div>printf("\nCompute the propagator matrix done.\n");<br></div><div>printf("\nCompute the kernel matrix.\n");<br></div><div>bse_kernel_nucleon_sc_av(Mn, pp_nodes, pp_weights, cos1_nodes, cos1_weights);<br></div><div>printf("\nCompute the kernel matrix done.\n");<br></div><div>printf("\nCompute the full kernel matrix by multiplying kernel and propagator matrix.\n");<br></div><div>MatrixXcd kernel_temp = H_KER * G0;<br></div><div>printf("\nCompute the full kernel matrix done.\n");<br></div><div><br></div><div>// Solve the eigen system with SLEPc<br></div><div>printf("\nSolve the eigen system in the rest frame.\n");<br></div><div>// Get the size of the Eigen matrix<br></div><div>int nRows = (int) kernel_temp.rows();<br></div><div>int nCols = (int) kernel_temp.cols();<br></div><div>// Create PETSc matrix and share the data of kernel_temp<br></div><div>Mat kernel;<br></div><div>PetscCall(MatCreateDense(PETSC_COMM_WORLD, PETSC_DECIDE, PETSC_DECIDE, nRows, nCols, kernel_temp.data(), &kernel));<br></div><div>PetscCall(MatAssemblyBegin(kernel, MAT_FINAL_ASSEMBLY));<br></div><div>PetscCall(MatAssemblyEnd(kernel, MAT_FINAL_ASSEMBLY));<br></div><div>```<br></div><div>Now I change to compute the propagator and kernel matrices in GPU and then compute the largest eigenvalues in CPU using SLEPc in the ways below.<br></div><div>```<br></div><div>cuDoubleComplex *h_propmat;<br></div><div>cuDoubleComplex *h_kernelmat;<br></div><div>int dim = EIGHT * NP * NZ;<br></div><div>printf("\nCompute the propagator matrix.\n");<br></div><div>prop_matrix_nucleon_sc_av_cuda(Mn, pp_nodes.data(), cos1_nodes.data());<br></div><div>printf("\nCompute the propagator matrix done.\n");<br></div><div>printf("\nCompute the kernel matrix.\n");<br></div><div>kernel_matrix_nucleon_sc_av_cuda(Mn, pp_nodes.data(), pp_weights.data(), cos1_nodes.data(), cos1_weights.data());<br></div><div>printf("\nCompute the kernel matrix done.\n");<br></div><div>printf("\nCompute the full kernel matrix by multiplying kernel and propagator matrix.\n");<br></div><div>// Map the raw arrays to Eigen matrices (column-major order)<br></div><div>auto *h_kernel_temp = new cuDoubleComplex [dim*dim];<br></div><div>matmul_cublas_cuDoubleComplex(h_kernelmat,h_propmat,h_kernel_temp,dim,dim,dim);<br></div><div>printf("\nCompute the full kernel matrix done.\n");<br></div><div><br></div><div>// Solve the eigen system with SLEPc<br></div><div>printf("\nSolve the eigen system in the rest frame.\n");<br></div><div>int nRows = dim;<br></div><div>int nCols = dim;<br></div><div>// Create PETSc matrix and share the data of kernel_temp<br></div><div>Mat kernel;<br></div><div>auto* h_kernel = (std::complex<double>*)(h_kernel_temp);<br></div><div>PetscCall(MatCreateDense(PETSC_COMM_WORLD, PETSC_DECIDE, PETSC_DECIDE, nRows, nCols, h_kernel_temp, &kernel));<br></div><div>PetscCall(MatAssemblyBegin(kernel, MAT_FINAL_ASSEMBLY));<br></div><div>PetscCall(MatAssemblyEnd(kernel, MAT_FINAL_ASSEMBLY));<br></div><div>But in this case, the compiler told me that the MatCreateDense function uses the data pointer as type of "thrust::complex<double>" instead of "std::complex<double>".<br></div><div>I am sure I only configured and install PETSc in purely CPU without GPU and this codes are written in the host function.<br></div><div>Why the function changes its behavior? Did you also meet this problem when writing the cuda codes and how to solve this problem.<br></div><div>I tried to copy the data to a new thrust::complex<double> matrix but this is very time consuming since my matrix is very big. Is there a way to create the Mat from the original data without changing the data type to thrust::complex<double> in the cuda applications? Any response will be appreciated. Thank you!<br></div><div><br></div><div>Best wishes,<br></div><div>Langtian Liu<br></div><div><br></div><div>------<br></div><div>Institute for Theorectical Physics, Justus-Liebig-University Giessen<br></div><div>Heinrich-Buff-Ring 16, 35392 Giessen Germany<br></div></blockquote></blockquote></blockquote></div></div></blockquote></div><div><br></div></body></html>