[MPICH2-dev] problem with multithreaded version of MPI_Comm_split() in mpich2-1.03

Rajeev Thakur thakur at mcs.anl.gov
Mon Jun 26 10:21:19 CDT 2006


Alexey,
           I had found this problem, but applied the fix in a different
place (below)
 
        if (mask_in_use || comm_ptr->context_id > lowestContextId) { 
            memset( local_mask, 0, MAX_CONTEXT_MASK * sizeof(int) ); 
            own_mask        = 0;            <---------------------------- 
            if (comm_ptr->context_id < lowestContextId) { 
                lowestContextId = comm_ptr->context_id;  
            }      
 
But your test hangs with this fix for nthreads > 4 (on a single machine).
 
I added your fix as well, but even then it hangs for nthreads > 4.
 
So I will have to look into it some more. Thanks for pointing out the
problem.
 
Rajeev
 


  _____  

From: owner-mpich2-dev at mcs.anl.gov [mailto:owner-mpich2-dev at mcs.anl.gov] On
Behalf Of Ryzhykh, Alexey
Sent: Monday, June 26, 2006 7:35 AM
To: mpich2-dev at mcs.anl.gov
Cc: Voronov, German; Supalov, Alexander
Subject: [MPICH2-dev] problem with multithreaded version of MPI_Comm_split()
in mpich2-1.03 



Hi,

I am working at Intel Parallel System & Applications group

I used mpich2-1.03 with sock device to run some MPI multithreaded
application.

Mpich2-10.3 was built to support MPI_THREAD_MULTIPLE level.

The application creates the communicator for each thread by means of
MPI_Comm_split() function.

This communicator is needed in order to MPI messages on one thread don't
intersect with messages on other thread.

But sometimes I see that the field context_id  is the same for different
thread communicators.

I created the test (see enclosed file test.tgz) to demonstrate the problem.
But this test  uses internal mpich2-1.03  definitions to check the value of
context_id.

Sometimes it fails as below:

nthreads=32

nthreads=32

[0] context_id=184 i=6 j=12

[1] context_id=184 i=6 j=12

 test failed, number of errors is 1

 test failed, number of errors is 1

The failure shows that context_id is the same for different communicators.

 

I have prepared the simple fix of this bug. See in attachment enclosed file
commutil.c.diff

I believe that the bug is be fixed  after applying this patch to the source
file  mpich2-1.0.3/src/mpi/comm/commutil.c

 

With best regards,

Alexey Ryzhykh,

---

Intel, Sarov

 

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.mcs.anl.gov/mailman/private/mpich2-dev/attachments/20060626/eec36d70/attachment.htm>


More information about the mpich2-dev mailing list