[petsc-users] Scalability of PETSc on vesta.alcf

Roc Wang pengxwang at hotmail.com
Mon Jan 20 11:20:42 CST 2014



> From: jed at jedbrown.org
> To: pengxwang at hotmail.com
> CC: petsc-users at mcs.anl.gov
> Subject: RE: [petsc-users] Scalability of PETSc on vesta.alcf
> Date: Mon, 20 Jan 2014 09:59:47 -0700
> 
> Please always use "reply-all" so that your messages go to the list.
> This is standard mailing list etiquette.  It is important to preserve
> threading for people who find this discussion later and so that we do
> not waste our time re-answering the same questions that have already
> been answered in private side-conversations.  You'll likely get an
> answer faster that way too.
> 
> Roc Wang <pengxwang at hotmail.com> writes:
> 
> > Thanks, Jed
> >
> >
> >> From: jed at jedbrown.org
> >> To: pengxwang at hotmail.com; petsc-users at mcs.anl.gov
> >> Subject: Re: [petsc-users] Scalability of PETSc on vesta.alcf
> >> Date: Mon, 20 Jan 2014 08:33:29 -0700
> >> 
> >> Roc Wang <pengxwang at hotmail.com> writes:
> >> 
> >> > Hello, 
> >> >
> >> >    I am testing a petsc program on vesta.alcf.acl.gov. The scalability
> >> >    was fine when the number of ranks is less then 1024. However, when
> >> >    the 2048 ranks were used, 
> >> 
> >> Are you using c64 mode in all cases or did you run smaller fewer
> >> processes per node out to 1024?  You can't do fair scaling with
> >> different modes because BG/Q has only 16 cores per node. 
> > The ranks=2048 was with mode c64 and ranks <1024 were with c1. 
> 
> That is completely different.  I recommend running c16 for all sizes;
> that should be efficient and reproducible.

  I tried c16 for 1024 ranks and 2048 ranks, but the job cannot run successfully. It seems the job was started but the program didn't execute. Please take a look at the attached log file for 1024 with c16 mode. Is this because some environment parameters I didn't set right? Actually, the same program is only able to run with 1024 ranks in c1, c2 and c32, c64 modes and  2048 ranks in c64 mode. 

> 
> >> The four hardware threads per core only cover latency, but do not significantly
> >> improve memory bandwidth.  
> > So, the bandwidth of c64 mode is kept same as c1, and it makes the computation slow down, right?
> >
> > I run a case of 1024 cores with c64 mode, the timing is 56.74 s which
> > is larger than c1 mode. So, it is still possible to have shorter
> > computation time with 4096 ranks in the same mode c64 compared with
> > 2048 (c64) and 1024(c64), right?
> 
> Yes, that indicates that you're still scaling well.
> 
> >>We're seeing most of the time in MatSolve,
> >> which does no communication.  (Also MatMult, but your large fill makes
> >> the factors much heavier than the matrix itself.)
> >
> > Which fill did you meant to larger? Is there any solution to make the large fill better?
> 
> ILU(3).  Reduce the number of levels to reduce MatSolve time.
 		 	   		  
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20140120/4822cd02/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: p1024_p16_153427.cobaltlog
Type: application/octet-stream
Size: 1926 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20140120/4822cd02/attachment.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: p1024_p16_153427.error
Type: application/octet-stream
Size: 2191 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20140120/4822cd02/attachment-0001.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: p1024_p16_153427.output
Type: application/octet-stream
Size: 27 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20140120/4822cd02/attachment-0002.obj>


More information about the petsc-users mailing list