[mpich-discuss] thread MPI calls

Darius Buntinas buntinas at mcs.anl.gov
Thu Jul 30 15:49:10 CDT 2009


OK.  Yes, unless do_work and do_very_little_work make any blocking calls
(like I/O), process 1 should have 100% cpu utilization.  This should be
fine (from a performance standpoint), as long as you aren't
oversubscribing your processors.

I'm going to try to reproduce your tests on our machines.  How many
worker processes do you have?  Is this all on one node?  If not how many
nodes?  How many cores do you have per node?

In the mean time can you check to which processor each process is bound?
 Make sure that each process is bound to its own core, and not to a
hyperthread.

Thanks,
-d



On 07/30/2009 02:02 PM, chong tan wrote:
> D,
> sorry for the confusion.  In our application, the setting is different
> from the code
> Pavan posted.  I will try to have them lined up here, (<--- is between
> thread,
> <==== is between proc)
>  
>             proc 0                                                 proc 1
>  
>   main thread    recv thread
>  
>   do_work()     MPI_Irecv                               do_work()
>                         MPI_Wait*()  <=======      MPI_Send()
>   blocked     <--- unblock                                
> do_very_litle_work()
>   MPI_Send                         ==========>   MPI_Recv()
>  
>  
> I don't know if the MPI_Recv call in Proc 1 is interferring with the
> MPI_Wait*() in Proc 1.  We
> see heavy system activity in Proc 1. 
>  
>                     
> tan
>  
> 
>  
> 
> ------------------------------------------------------------------------
> *From:* Darius Buntinas <buntinas at mcs.anl.gov>
> *To:* mpich-discuss at mcs.anl.gov
> *Sent:* Thursday, July 30, 2009 11:17:52 AM
> *Subject:* Re: [mpich-discuss] thread MPI calls
> 
> That sounds fishy.  If process 1 is doing a sleep(), you shouldn't see
> any activity from that process!  Can you double check that?
> 
> -d
> 
> On 07/30/2009 01:05 PM, chong tan wrote:
>> pavan,
>> the behavior you described is the expected behavior.  However, using
>> your example, we are also seeing
>> a lot of system activity in process 1 in all of our experiments.  That
>> contributes significantly
>> to the negative gain.
>> 
> 


More information about the mpich-discuss mailing list