[MPICH2-dev] problem with multithreaded version of MPI_Comm_split() in mpich2-1.03

Rajeev Thakur thakur at mcs.anl.gov
Mon Jun 26 12:22:33 CDT 2006


One difference is that I am running with the latest code in CVS. I replaced
your Comm_split with a Comm_dup (a simpler function), and even then it hangs
for nprocs=2 and nthreads=3.
 
Rajeev
 


  _____  

From: Ryzhykh, Alexey [mailto:alexey.ryzhykh at intel.com] 
Sent: Monday, June 26, 2006 12:19 PM
To: Rajeev Thakur; mpich2-dev at mcs.anl.gov
Cc: Voronov, German; Supalov, Alexander
Subject: RE: [MPICH2-dev] problem with multithreaded version of
MPI_Comm_split() in mpich2-1.03 



Rajeev,

I have checked your fix and both fixes together.

I have not seen any problems. I have run the test on single machine and on
two nodes for 4,5,6,7,8,16,24,32,64 threads.

 

Regards,

Alexey

 


  _____  


From: Rajeev Thakur [mailto:thakur at mcs.anl.gov] 
Sent: Monday, June 26, 2006 7:21 PM
To: Ryzhykh, Alexey; mpich2-dev at mcs.anl.gov
Cc: Voronov, German; Supalov, Alexander
Subject: RE: [MPICH2-dev] problem with multithreaded version of
MPI_Comm_split() in mpich2-1.03 

 

Alexey,

           I had found this problem, but applied the fix in a different
place (below)

 

        if (mask_in_use || comm_ptr->context_id > lowestContextId) { 

            memset( local_mask, 0, MAX_CONTEXT_MASK * sizeof(int) ); 
            own_mask        = 0;            <---------------------------- 
            if (comm_ptr->context_id < lowestContextId) { 
                lowestContextId = comm_ptr->context_id;  
            }      

 

But your test hangs with this fix for nthreads > 4 (on a single machine).

 

I added your fix as well, but even then it hangs for nthreads > 4.

 

So I will have to look into it some more. Thanks for pointing out the
problem.

 

Rajeev

 

 


  _____  


From: owner-mpich2-dev at mcs.anl.gov [mailto:owner-mpich2-dev at mcs.anl.gov] On
Behalf Of Ryzhykh, Alexey
Sent: Monday, June 26, 2006 7:35 AM
To: mpich2-dev at mcs.anl.gov
Cc: Voronov, German; Supalov, Alexander
Subject: [MPICH2-dev] problem with multithreaded version of MPI_Comm_split()
in mpich2-1.03 

Hi,

I am working at Intel Parallel System & Applications group

I used mpich2-1.03 with sock device to run some MPI multithreaded
application.

Mpich2-10.3 was built to support MPI_THREAD_MULTIPLE level.

The application creates the communicator for each thread by means of
MPI_Comm_split() function.

This communicator is needed in order to MPI messages on one thread don't
intersect with messages on other thread.

But sometimes I see that the field context_id  is the same for different
thread communicators.

I created the test (see enclosed file test.tgz) to demonstrate the problem.
But this test  uses internal mpich2-1.03  definitions to check the value of
context_id.

Sometimes it fails as below:

nthreads=32

nthreads=32

[0] context_id=184 i=6 j=12

[1] context_id=184 i=6 j=12

 test failed, number of errors is 1

 test failed, number of errors is 1

The failure shows that context_id is the same for different communicators.

 

I have prepared the simple fix of this bug. See in attachment enclosed file
commutil.c.diff

I believe that the bug is be fixed  after applying this patch to the source
file  mpich2-1.0.3/src/mpi/comm/commutil.c

 

With best regards,

Alexey Ryzhykh,

---

Intel, Sarov

 

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.mcs.anl.gov/mailman/private/mpich2-dev/attachments/20060626/dcf555ae/attachment.htm>


More information about the mpich2-dev mailing list