[petsc-users] possible performance issues with PETSc on Cray

Samar Khatiwala spk at ldeo.columbia.edu
Fri Apr 11 03:49:44 CDT 2014


Hello,

This is a somewhat vague query but I and a colleague have been running PETSc (3.4.3.0) on a Cray 
XC30 in Germany (https://www.hlrn.de/home/view/System3/WebHome) and the system administrators 
alerted us to some anomalies with our jobs that may or may not be related to PETSc but I thought I'd ask 
here in case others have noticed something similar.

First, there was a large variation in run-time for identical jobs, sometimes as much as 50%. We didn't 
really pick up on this but other users complained to the IT people that their jobs were taking a performance 
hit with a similar variation in run-time. At that point we're told the IT folks started monitoring jobs and 
carrying out tests to see what was going on. They discovered that (1) this always happened when we were 
running our jobs and (2) the problem got worse with physical proximity to the nodes on which our jobs were 
running (what they described as a "strong interaction" between our jobs and others presumably through the 
communication network).

If it helps, our jobs essentially do nothing more than a series of (sparse) MatMult and MatMultTranspose operations 
(many many thousands of times per run). There is very little I/O. The code has been running for years on many 
different systems without there being an issue.

The system administrators have been at pains to tell me that they're not pointing fingers at PETSc and at 
this point we're just trying to explore different possibilities and if someone has come across similar behavior 
it would help narrow things down.

I'm happy to provide further information.

Thanks very much!

Best,

Samar



More information about the petsc-users mailing list