[mpich-discuss] thread MPI calls

Wed Jul 29 15:19:19 CDT 2009

I like to provide futher info on what we have experiment, hopefully this
can of some use, even with my future competitor (they suscribe this email too).

1.  We replace Wait_all for Wait_any, and there is no different in performance
2.  We experimented with affining the recv thread to the same physcial CPU-same core 
     pair, different core pair, and same core as main thread (core pair done on AMD boxes).  .  
    This experiment is done on :
          - Irecv versus Recv
          - wait_all and wait_any
          - early versus late wait_all, and wait_any, wait
          - wait in recv thread versus wait in main thread
    Amazigly, running the recv thread on the same core as main thread is the fastest one, it almost 
    has the same performance as non-threaded implementation with some code combo.

    Early wait, both all and any, is the worst performer.

   Given that we know one partitcular test between more than 5% of time is spent in master
   proc (id ==0) completing the already sent data via MPI_Recv (from MPE and our monitor, and
   the fact that master proc got the biggest chunk of work in the test), we expect to see some positive 
   sign of live using threaded MPI. or at least not the negative gain as we have experience.

one issue we observed what that Wait*,  is causing significant sys activities in other processes.
Maybe this is the problem, maybe not.

tan

________________________________
From: Pavan Balaji <balaji at mcs.anl.gov>
To: mpich-discuss at mcs.anl.gov
Sent: Tuesday, July 28, 2009 7:24:49 PM
Subject: Re: [mpich-discuss] thread MPI calls

> we just completed 1 partiticular tests using SERIALIZED, and that make no difference (compared to
> MULTIPLE). 

In that case, the problem is likely with the algorithm itself. In SERIALIZED mode, MPI does not add any locks and will not add any additional overhead. But it looks like your algorithm is blocking waiting for data from all slave processes before proceeding to the next iteration -- this will cause a lot of idle time. Is there someway you can optimize your algorithm?

-- Pavan

-- Pavan Balaji
http://www.mcs.anl.gov/~balaji

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20090729/0cfcc8aa/attachment.htm>