<div dir="ltr"><div><div>I am using Matload option as in the ex7.c code given by the Slepc. <br></div>ierr = MatLoad(A,viewer);CHKERRQ(ierr);<br><br><br></div><div>There is no problem here right ? or any additional option is required for very large matrices while running the eigensolver in parallel ?<br><br></div><div>cheers,<br></div><div>Venkatesh<br></div></div><div class="gmail_extra"><br><div class="gmail_quote">On Sat, May 23, 2015 at 5:43 PM, Matthew Knepley <span dir="ltr"><<a href="mailto:knepley@gmail.com" target="_blank">knepley@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><span class="">On Sat, May 23, 2015 at 7:09 AM, venkatesh g <span dir="ltr"><<a href="mailto:venkateshgk.j@gmail.com" target="_blank">venkateshgk.j@gmail.com</a>></span> wrote:<br></span><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div><div><div><div>Hi,<br></div>Thanks. <br></div><span class="">Per node it has 24 cores and each core has 4 GB RAM. And the job was submitted in 10 nodes.<br><br></span></div><span class="">So, does it mean it requires 10G for one core ? or for 1 node ? <br></span></div></div></blockquote><div><br></div><div>The error message from MUMPS said that it tried to allocate 10G. We must assume each process</div><div>tried to do the same thing. That means if you scheduled 24 processes on a node, it would try to</div><div>allocate at least 240G, which is in excess of what you specify above.</div><div><br></div><div>Note that this has nothing to do with PETSc. It is all in the documentation for that machine and its</div><div>scheduling policy.</div><div><br></div><div> Thanks,</div><div><br></div><div> Matt</div><div><div class="h5"><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div></div><div>cheers,<br><br></div>Venkatesh<br></div><div class="gmail_extra"><br><div class="gmail_quote">On Sat, May 23, 2015 at 5:17 PM, Matthew Knepley <span dir="ltr"><<a href="mailto:knepley@gmail.com" target="_blank">knepley@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><span>On Sat, May 23, 2015 at 6:44 AM, venkatesh g <span dir="ltr"><<a href="mailto:venkateshgk.j@gmail.com" target="_blank">venkateshgk.j@gmail.com</a>></span> wrote:<br></span><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div><div><div><div><div><div><div>Hi, <br></div><span>The same eigenproblem runs with 120 GB RAM in a serial machine in Matlab. <br><br></span></div><span>In Cray I fired with 240*4 GB RAM in parallel. So it has to go in right ? <br></span></div></div></div></div></div></div></blockquote><div><br></div><div>I do not know how MUMPS allocates memory, but the message is unambiguous. Also,</div><div>this is concerned with the memory available per node. Do you know how many processes</div><div>per node were scheduled? The message below indicates that it was trying to allocate 10G</div><div>for one process.</div><span><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div><div><div><div><div></div>And for small matrices it is having negative scaling i.e 24 core is running faster. <br></div></div></div></div></div></blockquote><div><br></div></span><div>Yes, for strong scaling you always get slowdown eventually since overheads dominate</div><div>work, see Amdahl's Law.</div><div><br></div><div> Thanks,</div><div><br></div><div> Matt</div><div><div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div><div><div><div></div>I have attached the submission script. <br><br></div>Pls see.. Kindly let me know<br><br></div>cheers,<br></div>Venkatesh<br><div><div><div><div><div><div><br></div></div></div></div></div></div></div><div class="gmail_extra"><br><div class="gmail_quote">On Sat, May 23, 2015 at 4:58 PM, Matthew Knepley <span dir="ltr"><<a href="mailto:knepley@gmail.com" target="_blank">knepley@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><span>On Sat, May 23, 2015 at 2:39 AM, venkatesh g <span dir="ltr"><<a href="mailto:venkateshgk.j@gmail.com" target="_blank">venkateshgk.j@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div><div><div><div>Hi again,<br><br></div>I have installed the Petsc and Slepc in Cray with intel compilers with Mumps. <br><br></div>I am getting this error when I solve eigenvalue problem with large matrices: [201]PETSC ERROR: Error reported by MUMPS in numerical factorization phase: Cannot allocate required memory 9632 megabytes<br></div></div></div></blockquote><div><br></div></span><div>It ran out of memory on the node.</div><span><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div><div></div>Also it is again not scaling well for small matrices.<br></div></div></blockquote><div><br></div></span><div>MUMPS strong scaling for small matrices is not very good. Weak scaling is looking at big matrices.</div><div><div><div><br></div><div> Thanks,</div><div><br></div><div> Matt</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div></div><div>Kindly let me know what to do.<br><br></div><div>cheers,<br></div><div><br></div>Venkatesh<br><div><div><div><div><br></div></div></div></div></div><div class="gmail_extra"><br><div class="gmail_quote">On Tue, May 19, 2015 at 3:02 PM, Matthew Knepley <span dir="ltr"><<a href="mailto:knepley@gmail.com" target="_blank">knepley@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><span>On Tue, May 19, 2015 at 1:04 AM, venkatesh g <span dir="ltr"><<a href="mailto:venkateshgk.j@gmail.com" target="_blank">venkateshgk.j@gmail.com</a>></span> wrote:<br></span><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div>Hi,</div><div><br></div><span>I have attached the log of the command which I gave in the master node: make streams NPMAX=32<div><br></div><div>I dont know why it says 'It appears you have only 1 node'. But other codes run in parallel with good scaling on 8 nodes.<br></div></span></div></blockquote><div><br></div><div>If you look at the STREAMS numbers, you can see that your system is only able to support about 2 cores with the</div><div>available memory bandwidth. Thus for bandwidth constrained operations (almost everything in sparse linear algebra</div><div>and solvers), your speedup will not be bigger than 2.</div><div><br></div><div>Other codes may do well on this machine, but they would be compute constrained, using things like DGEMM.</div><div><br></div><div> Thanks,</div><div><br></div><div> Matt</div><div><div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div></div><div>Kindly let me know.<br></div><div><br></div><div>Venkatesh</div><div><br></div><div><br></div></div><div class="gmail_extra"><br><div class="gmail_quote">On Mon, May 18, 2015 at 11:21 PM, Barry Smith <span dir="ltr"><<a href="mailto:bsmith@mcs.anl.gov" target="_blank">bsmith@mcs.anl.gov</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><br>
Run the streams benchmark on this system and send the results. <a href="http://www.mcs.anl.gov/petsc/documentation/faq.html#computers" target="_blank">http://www.mcs.anl.gov/petsc/documentation/faq.html#computers</a><br>
<div><div><br>
<br>
> On May 18, 2015, at 11:14 AM, venkatesh g <<a href="mailto:venkateshgk.j@gmail.com" target="_blank">venkateshgk.j@gmail.com</a>> wrote:<br>
><br>
> Hi,<br>
> I have emailed the mumps-user list.<br>
> Actually the cluster has 8 nodes with 16 cores, and other codes scale well.<br>
> I wanted to ask if this job takes much time, then if I submit on more cores, I have to increase the icntl(14).. which would again take long time.<br>
><br>
> So is there another way ?<br>
><br>
> cheers,<br>
> Venkatesh<br>
><br>
> On Mon, May 18, 2015 at 7:16 PM, Matthew Knepley <<a href="mailto:knepley@gmail.com" target="_blank">knepley@gmail.com</a>> wrote:<br>
> On Mon, May 18, 2015 at 8:29 AM, venkatesh g <<a href="mailto:venkateshgk.j@gmail.com" target="_blank">venkateshgk.j@gmail.com</a>> wrote:<br>
> Hi I have attached the performance logs for 2 jobs on different processors. I had to increase the workspace icntl(14) when I submit on more cores since it is failing with small value of icntl(14).<br>
><br>
> 1. performance_log1.txt is run on 8 cores (option given -mat_mumps_icntl_14 200)<br>
> 2. performance_log2.txt is run on 2 cores (option given -mat_mumps_icntl_14 85 )<br>
><br>
> 1) Your number of iterates increased from 7600 to 9600, but that is a relatively small effect<br>
><br>
> 2) MUMPS is just taking a lot longer to do forward/backward solve. You might try emailing<br>
> the list for them. However, I would bet that your system has enough bandwidth for 2 procs<br>
> and not much more.<br>
><br>
> Thanks,<br>
><br>
> Matt<br>
><br>
> Venkatesh<br>
><br>
> On Sun, May 17, 2015 at 6:13 PM, Matthew Knepley <<a href="mailto:knepley@gmail.com" target="_blank">knepley@gmail.com</a>> wrote:<br>
> On Sun, May 17, 2015 at 1:38 AM, venkatesh g <<a href="mailto:venkateshgk.j@gmail.com" target="_blank">venkateshgk.j@gmail.com</a>> wrote:<br>
> Hi, Thanks for the information. I now increased the workspace by adding '-mat_mumps_icntl_14 100'<br>
><br>
> It works. However, the problem is, if I submit in 1 core I get the answer in 200 secs, but with 4 cores and '-mat_mumps_icntl_14 100' it takes 3500secs.<br>
><br>
> Send the output of -log_summary for all performance queries. Otherwise we are just guessing.<br>
><br>
> Matt<br>
><br>
> My command line is: 'mpiexec -np 4 ./ex7 -f1 a2 -f2 b2 -eps_nev 1 -st_type sinvert -eps_max_it 5000 -st_ksp_type preonly -st_pc_type lu -st_pc_factor_mat_solver_package mumps -mat_mumps_icntl_14 100'<br>
><br>
> Kindly let me know.<br>
><br>
> Venkatesh<br>
><br>
><br>
><br>
> On Sat, May 16, 2015 at 7:10 PM, David Knezevic <<a href="mailto:david.knezevic@akselos.com" target="_blank">david.knezevic@akselos.com</a>> wrote:<br>
> On Sat, May 16, 2015 at 8:08 AM, venkatesh g <<a href="mailto:venkateshgk.j@gmail.com" target="_blank">venkateshgk.j@gmail.com</a>> wrote:<br>
> Hi,<br>
> I am trying to solving AX=lambda BX eigenvalue problem.<br>
><br>
> A and B are of sizes 3600x3600<br>
><br>
> I run with this command :<br>
><br>
> 'mpiexec -np 4 ./ex7 -f1 a2 -f2 b2 -eps_nev 1 -st_type sinvert -eps_max_it 5000 -st_ksp_type preonly -st_pc_type lu -st_pc_factor_mat_solver_package mumps'<br>
><br>
> I get this error: (I get result only when I give 1 or 2 processors)<br>
> Reading COMPLEX matrices from binary files...<br>
> [0]PETSC ERROR: --------------------- Error Message ------------------------------------<br>
> [0]PETSC ERROR: Error in external library!<br>
> [0]PETSC ERROR: Error reported by MUMPS in numerical factorization phase: INFO(1)=-9, INFO(2)=2024<br>
><br>
><br>
> The MUMPS error types are described in Chapter 7 of the MUMPS manual. In this case you have INFO(1)=-9, which is explained in the manual as:<br>
><br>
> "–9 Main internal real/complex workarray S too small. If INFO(2) is positive, then the number of entries that are missing in S at the moment when the error is raised is available in INFO(2). If INFO(2) is negative, then its absolute value should be multiplied by 1 million. If an error –9 occurs, the user should increase the value of ICNTL(14) before calling the factorization (JOB=2) again, except if ICNTL(23) is provided, in which case ICNTL(23) should be increased."<br>
><br>
> This says that you should use ICTNL(14) to increase the working space size:<br>
><br>
> "ICNTL(14) is accessed by the host both during the analysis and the factorization phases. It corresponds to the percentage increase in the estimated working space. When significant extra fill-in is caused by numerical pivoting, increasing ICNTL(14) may help. Except in special cases, the default value is 20 (which corresponds to a 20 % increase)."<br>
><br>
> So, for example, you can avoid this error via the following command line argument to PETSc: "-mat_mumps_icntl_14 30", where 30 indicates that we allow a 30% increase in the workspace instead of the default 20%.<br>
><br>
> David<br>
><br>
><br>
><br>
><br>
><br>
><br>
> --<br>
> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.<br>
> -- Norbert Wiener<br>
><br>
><br>
><br>
><br>
> --<br>
> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.<br>
> -- Norbert Wiener<br>
><br>
<br>
</div></div></blockquote></div><br></div>
</blockquote></div></div></div><div><div><br><br clear="all"><span><font color="#888888"><span><font color="#888888"><div><br></div>-- <br><div>What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.<br>-- Norbert Wiener</div>
</font></span></font></span></div></div></div></div><span><font color="#888888">
</font></span></blockquote></div><span><font color="#888888"><br></font></span></div><span><font color="#888888">
</font></span></blockquote></div></div></div><span><font color="#888888"><div><div><br><br clear="all"><span><font color="#888888"><div><br></div>-- <br><div>What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.<br>-- Norbert Wiener</div>
</font></span></div></div></font></span></div></div><span><font color="#888888">
</font></span></blockquote></div><span><font color="#888888"><br></font></span></div><span><font color="#888888">
</font></span></blockquote></div></div></div><span><font color="#888888"><div><div><br><br clear="all"><div><br></div>-- <br><div>What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.<br>-- Norbert Wiener</div>
</div></div></font></span></div></div>
</blockquote></div><br></div>
</blockquote></div></div></div><div><div class="h5"><br><br clear="all"><div><br></div>-- <br><div>What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.<br>-- Norbert Wiener</div>
</div></div></div></div>
</blockquote></div><br></div>