<div dir="ltr"><div><div><div>Hi,<br><br></div>I have two small (2500x2500) matrices parallelized across 480 processors. One of them is an MPIAIJ matrix while the other is an MPIDENSE matrix. I perform a MatMatMult involving these two matrices. I tried these operations on two machines, one is the local cluster at University of Michigan and the other is the XSEDE Comet machine. The Comet machine takes 10-20 times more time in steps involving MatAssembly and MatMatMult of the aforementioned matrices. I have other Petsc MatMult operations in the same code involving larger matrices (4 million x 4 million) which show similar timing on both machines. It's just those small parallel matrices that are inconsistent in terms of their timings. I used same the compilers and MPI libraries in both the machines except that I have suppressed "avx2" flag in Comet. I believe avx2 affects floating point operations and not communication. I would like to know what might be causing these inconsistencies only in case of the small matrices. Are there any network settings that I can look into and compare?<br><br></div>Regards,<br></div>Bikash  <br clear="all"><div><div><div><div><div><br>-- <br><div><div dir="ltr"><div><div><div><div><font color="#666666">Bikash S. Kanungo<br></font></div><font color="#666666">PhD Student<br></font></div><font color="#666666">Computational Materials Physics Group<br></font></div><font color="#666666">Mechanical Engineering <br></font></div><font color="#666666">University of Michigan<br><br></font></div></div>

</div></div></div></div></div></div>