<div dir="ltr">On Tue, Jan 29, 2013 at 8:11 PM, Gaurish Telang <span dir="ltr"><<a href="mailto:gaurish108@gmail.com" target="_blank">gaurish108@gmail.com</a>></span> wrote:<br><div class="gmail_extra"><div class="gmail_quote">
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">Hello, <div><br></div><div>I am trying to solve some over-determined / under-determined least-squares systems in parallel using the LSQR routine of PETSc. </div>
<div>In spite of being able to solve the least squares problem || Ax-b || _correctly_ with this Krylov method on multiple processors, I don't seem to be acheiving scalability with respect to wall-clock time</div>
<div>for the set of matrices that I am interested in. </div><div><br></div><div>Below, I have listed the matrix sizes, the number of non-zeros in each matrix and the wall-clock times to required for 10 iterations of the LSQR krylov method. </div>
<div>The timings have been amortized over 50 iterations of the solver i.e. I solve the least-square problem 50 times ( where each time 10 iterations of LSQR are carried out) </div><div>and obtain the arithmetic mean of wall-clock times so recorded (i.e (sum-of-clock-times)/50).</div>
</blockquote><div><br></div><div style>Never ask performance questions without sending the output of -log_summary.</div><div style><br></div><div style>Also:</div><div style><br></div><div style> <a href="http://www.mcs.anl.gov/petsc/documentation/faq.html#computers">http://www.mcs.anl.gov/petsc/documentation/faq.html#computers</a></div>
<div style> <a href="http://www.mcs.anl.gov/petsc/documentation/faq.html#slowerparallel">http://www.mcs.anl.gov/petsc/documentation/faq.html#slowerparallel</a><br></div><div style><br></div><div style> Matt</div><div> </div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div>Matrix size: 28407 x 19899</div><div>Non-zeros: 725363</div>
<div>Wall clock times </div><div>1 proc: 0.094 s</div><div>2 proc: 0.087 s</div><div>3 proc: 0.089 s</div><div>4 proc: 0.093 s</div>
<div><br></div><div>Matrix size: 95194 x 155402</div><div><div>Non-zeros: 3877164</div><div>Wall clock times :</div><div>1 proc: 0.23 s</div><div>2 proc: 0.21 s</div><div>3 proc: 0.22 s</div><div>4 proc: 0.23 s</div></div>
<div><br></div><div>Matrix size: 125878 x 207045</div><div><div>Non-zeros: 3717995</div><div>Wall clock times </div><div>1 proc: 0.24 s</div><div>2 proc: 0.22 s</div><div>3 proc: 0.23 s</div><div>4 proc: 0.24 s</div></div>
<div><br></div><div>I have other matrices which show similar bad scaling as I increase the processor count.</div><div><br></div><div>Please let me know what I can do to improve the performance after I increase the processor count.</div>
<div>I feel it is a data-layout problem i.e. I need to chose some other data-structure with PETSc to represent the matrix and vector of my least-squares problem.</div><div><br></div><div>Currently my Matrix type is "mpiaij" and Vec type is "mpi" which I set at the command-line while running the executable.</div>
<div><br></div><div>I use the terminal command : </div><div> mpirun -np <num-procs> ./test_parallel </div><div> -fmat <binary-mat-file> </div><div> -frhs <binary-vec-file> </div>
<div> -vec_type mpi </div><div> -mat_type mpiaij </div><div> -ksp_type lsqr </div><div> -ksp_max_it 10</div>
<div><br></div><div><br></div><div>I have pasted the code I am using below for your reference. </div><div><br></div><div>Thank you,</div><div>Gaurish.</div><div><br></div><div><br></div><div><br></div><div><br></div><div>
<br></div><div><br></div><div>==============================================================================================================================</div><div><br></div><div><div>#include <petscksp.h></div>
<div>
#include <petscsys.h></div><div> #include <petsctime.h></div><div>#include <stdio.h></div><div>#undef __FUNCT__</div><div>#define __FUNCT__ "main"</div><div>int main(int argc,char **args)</div>
<div>{</div><div> Vec x, b, residue; /* approx solution, RHS, residual */</div><div> Mat A; /* linear system matrix */</div><div> KSP ksp; /* linear solver context */</div>
<div> PC pc; /* preconditioner context */</div><div> PetscErrorCode ierr;</div><div> PetscInt m,n ; /* # number of rows and columns of the matrix read in*/</div><div> PetscViewer fd ;</div>
<div> PetscInt size;</div><div> //tscInt its;</div><div> //PetscScalar norm, tol=1.e-5;</div><div> PetscBool flg;</div><div> char fmat[PETSC_MAX_PATH_LEN]; /* input file names */</div>
<div> char frhs[PETSC_MAX_PATH_LEN]; /* input file names */</div><div><br></div><div> PetscInitialize(&argc,&args,(char *)0,help);</div><div> ierr = MPI_Comm_size(PETSC_COMM_WORLD,&size);CHKERRQ(ierr);</div>
<div><br></div><div> ierr = PetscOptionsGetString(PETSC_NULL,"-fmat",fmat,PETSC_MAX_PATH_LEN,&flg);</div><div> ierr = PetscOptionsGetString(PETSC_NULL,"-frhs",frhs,PETSC_MAX_PATH_LEN,&flg);</div>
<div><br></div><div><br></div><div> /* Read in the matrix */</div><div> ierr = PetscViewerBinaryOpen(PETSC_COMM_WORLD, fmat, FILE_MODE_READ, &fd ); CHKERRQ(ierr);</div><div> ierr = MatCreate(PETSC_COMM_WORLD,&A);</div>
<div> // ierr = MatSetType(A,MATSEQAIJ);</div><div> ierr = MatSetFromOptions(A);CHKERRQ(ierr);</div><div> ierr = MatLoad(A,fd);</div><div> ierr = PetscViewerDestroy(&fd);</div><div><br></div><div><br></div><div>
/*Read in the right hand side*/</div>
<div> ierr = PetscViewerBinaryOpen(PETSC_COMM_WORLD, frhs, FILE_MODE_READ, &fd ); CHKERRQ(ierr);</div><div> ierr = VecCreate(PETSC_COMM_WORLD,&b);</div><div> ierr = VecSetFromOptions(b);CHKERRQ(ierr);</div><div>
ierr = VecLoad(b,fd);</div><div> ierr = PetscViewerDestroy(&fd);</div><div><br></div><div><br></div><div> /* Get the matrix size. Used for setting the vector sizes*/</div><div> ierr = MatGetSize(A , &m, &n);</div>
<div> //printf("The size of the matrix read in is %d x %d\n", m , n);</div><div> </div><div> </div><div> VecCreate(PETSC_COMM_WORLD, &x);</div><div> VecSetSizes(x, PETSC_DECIDE, n);</div><div> VecSetFromOptions(x);</div>
<div><br></div><div> VecCreate(PETSC_COMM_WORLD, &residue);</div><div> VecSetSizes(residue, PETSC_DECIDE, m);</div><div> VecSetFromOptions(residue);</div><div><br></div><div><br></div><div> /* Set the solver type at the command-line */</div>
<div> ierr = KSPCreate(PETSC_COMM_WORLD,&ksp);CHKERRQ(ierr);</div><div> ierr = KSPSetOperators(ksp,A,A,DIFFERENT_NONZERO_PATTERN);CHKERRQ(ierr);</div><div> ierr = KSPGetPC(ksp,&pc);CHKERRQ(ierr);</div><div> ierr = PCSetType(pc,PCNONE);CHKERRQ(ierr);</div>
<div> ierr = KSPSetTolerances(ksp,1.e-5,PETSC_DEFAULT,PETSC_DEFAULT,PETSC_DEFAULT);CHKERRQ(ierr);</div><div> /* Set runtime options, e.g.,</div><div> -ksp_type <type> -pc_type <type> -ksp_monitor -ksp_rtol <rtol></div>
<div> These options will override those specified above as long as</div><div> KSPSetFromOptions() is called _after_ any other customization</div><div> routines */</div><div> ierr = KSPSetFromOptions(ksp);CHKERRQ(ierr);</div>
<div><br></div><div> /* Initial guess for the krylov method set to zero*/</div><div> PetscScalar p = 0;</div><div> ierr = VecSet(x,p);CHKERRQ(ierr);</div><div> ierr = KSPSetInitialGuessNonzero(ksp,PETSC_TRUE);CHKERRQ(ierr);</div>
<div> /* Solve linear system */</div><div> </div><div> PetscLogDouble v1,v2,elapsed_time[50];</div><div> int i;</div><div> for (i = 0; i < 50; ++i)</div><div> {</div><div> ierr = PetscGetTime(&v1);CHKERRQ(ierr);</div>
<div> ierr = KSPSolve(ksp,b,x);CHKERRQ(ierr);</div><div> ierr = PetscGetTime(&v2);CHKERRQ(ierr);</div><div> elapsed_time[i] = v2 - v1; </div><div> PetscPrintf(PETSC_COMM_WORLD,"[%d] Time for the solve: %g s\n", i , elapsed_time[i]);</div>
<div> }</div><div> </div><div> PetscLogDouble sum=0,amortized_time ;</div><div> for ( i = 0; i < 50; ++i)</div><div> {</div><div> sum += elapsed_time[i];</div><div> }</div><div> amortized_time = sum/50;</div>
<div> PetscPrintf(PETSC_COMM_WORLD,"\n\n***********************\nAmortized Time for the solve: %g s\n***********************\n\n", amortized_time);</div><div> </div><div><br></div><div> /* View solver info; we could instead use the option -ksp_view to print this info to the screen at the conclusion of KSPSolve().*/</div>
<div> ierr = KSPView(ksp,PETSC_VIEWER_STDOUT_WORLD);CHKERRQ(ierr);</div><div> </div><div> /* Clean up */</div><div> ierr = VecDestroy(&x);CHKERRQ(ierr); </div><div> ierr = VecDestroy(&residue);CHKERRQ(ierr);</div>
<div> ierr = VecDestroy(&b);CHKERRQ(ierr); </div><div> ierr = MatDestroy(&A);CHKERRQ(ierr);</div><div> ierr = KSPDestroy(&ksp);CHKERRQ(ierr);</div><div><br></div><div> ierr = PetscFinalize();</div><div> return 0;</div>
<div>}</div></div><div><br></div><div><br></div><div><br></div>
</blockquote></div><br><br clear="all"><div><br></div>-- <br>What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.<br>
-- Norbert Wiener
</div></div>