[petsc-dev] VecScatter scaling problem on KNL

Thu Mar 9 07:20:44 CST 2017

Great!  Let’s try it.
The yellow flag was raised by me, actually.
Theoretically, the large scale XGC simulation should be much faster on Cori than on Titan (because GPU’s theoretical peak is unrealistically high).
However, we found out that the XGC’s run speed was about the same as on Titan.
Hopefully, we will recover more reasonable efficiency against the theoretical peak on Cori.
CS

> On Mar 9, 2017, at 7:20 AM, Mark Adams <mfadams at lbl.gov> wrote:
> 
>>   Ok, in this situation VecScatter cannot detect that it is an all to all so will generate a message from each process to each other process. Given my past experience with Cray MPI (why do they even have their own MPI when Intel provides one; in fact why does Cray even exist when they just take other people's products and put their name on them) I am not totally surprised if the Cray MPI chocks on this flood of messages.
>> 
>>   1) Test with Intel MPI, perhaps they handle this case in a scalable way
>> 
>>    2) If Intel MPI also produces poor performance then (interesting, how come on other systems in the past this wasn't a bottleneck for the code?) the easiest solution is to separate the operation into two parts. Use a VecScatterCreateToAll() to get all the data to all the processes and then use another (purely sequential) VecScatter to get the data from this intermediate buffer into the final vector that has the "extra" locations for the boundary conditions in the final destination vector.
> 
> Yes, this is what I am thinking I will do. This sort of problem will
> only get worse so we might as well do it at some point and I would bet
> that we could just use Intel MPI now to get this project moving now.