[MPICH2-dev] problem with multithreaded version of MPI_Comm_split() in mpich2-1.03
Ryzhykh, Alexey
alexey.ryzhykh at intel.com
Mon Jun 26 12:19:03 CDT 2006
Rajeev,
I have checked your fix and both fixes together.
I have not seen any problems. I have run the test on single machine and
on two nodes for 4,5,6,7,8,16,24,32,64 threads.
Regards,
Alexey
________________________________
From: Rajeev Thakur [mailto:thakur at mcs.anl.gov]
Sent: Monday, June 26, 2006 7:21 PM
To: Ryzhykh, Alexey; mpich2-dev at mcs.anl.gov
Cc: Voronov, German; Supalov, Alexander
Subject: RE: [MPICH2-dev] problem with multithreaded version of
MPI_Comm_split() in mpich2-1.03
Alexey,
I had found this problem, but applied the fix in a different
place (below)
if (mask_in_use || comm_ptr->context_id > lowestContextId) {
memset( local_mask, 0, MAX_CONTEXT_MASK * sizeof(int) );
own_mask = 0;
<----------------------------
if (comm_ptr->context_id < lowestContextId) {
lowestContextId = comm_ptr->context_id;
}
But your test hangs with this fix for nthreads > 4 (on a single
machine).
I added your fix as well, but even then it hangs for nthreads > 4.
So I will have to look into it some more. Thanks for pointing out the
problem.
Rajeev
________________________________
From: owner-mpich2-dev at mcs.anl.gov
[mailto:owner-mpich2-dev at mcs.anl.gov] On Behalf Of Ryzhykh, Alexey
Sent: Monday, June 26, 2006 7:35 AM
To: mpich2-dev at mcs.anl.gov
Cc: Voronov, German; Supalov, Alexander
Subject: [MPICH2-dev] problem with multithreaded version of
MPI_Comm_split() in mpich2-1.03
Hi,
I am working at Intel Parallel System & Applications group
I used mpich2-1.03 with sock device to run some MPI
multithreaded application.
Mpich2-10.3 was built to support MPI_THREAD_MULTIPLE level.
The application creates the communicator for each thread by
means of MPI_Comm_split() function.
This communicator is needed in order to MPI messages on one
thread don't intersect with messages on other thread.
But sometimes I see that the field context_id is the same for
different thread communicators.
I created the test (see enclosed file test.tgz) to demonstrate
the problem. But this test uses internal mpich2-1.03 definitions to
check the value of context_id.
Sometimes it fails as below:
nthreads=32
nthreads=32
[0] context_id=184 i=6 j=12
[1] context_id=184 i=6 j=12
test failed, number of errors is 1
test failed, number of errors is 1
The failure shows that context_id is the same for different
communicators.
I have prepared the simple fix of this bug. See in attachment
enclosed file commutil.c.diff
I believe that the bug is be fixed after applying this patch to
the source file mpich2-1.0.3/src/mpi/comm/commutil.c
With best regards,
Alexey Ryzhykh,
---
Intel, Sarov
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.mcs.anl.gov/mailman/private/mpich2-dev/attachments/20060626/73a030ba/attachment.htm>
More information about the mpich2-dev
mailing list