<div dir="auto">Thanks for your response.<div dir="auto"><br></div><div dir="auto">My mpich version is 3.0.</div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Fri, 18 Jun 2021, 6:24 am Snyder, Shane, <<a href="mailto:ssnyder@mcs.anl.gov" target="_blank" rel="noreferrer">ssnyder@mcs.anl.gov</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div dir="ltr">
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
Hi Hassan,</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<br>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
I'm assuming you don't see this error if Darshan isn't preloaded?<span id="m_-176047973476195906m_-7377708176640975596🙂"></span><br>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<br>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
I just built MADbench2 on my system, and it runs fine for me with Darshan preloaded and generates a log. I'm just running 16 processes on my laptop using MPICH 3.2.1.</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<br>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
Can you share more details about your setup? For starters, what MPI are you using, what version of Darshan are you using, and how have you configured Darshan? I can't really think of any reason Darshan would cause a crash in MPI_Comm_split off the top of my
head, so might need to find a way for me to reproduce the issue.<br>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<br>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
Thanks,</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
--Shane<br>
</div>
<div id="m_-176047973476195906m_-7377708176640975596appendonsend"></div>
<hr style="display:inline-block;width:98%">
<div id="m_-176047973476195906m_-7377708176640975596divRplyFwdMsg" dir="ltr"><font face="Calibri, sans-serif" style="font-size:11pt" color="#000000"><b>From:</b> Darshan-users <<a href="mailto:darshan-users-bounces@lists.mcs.anl.gov" rel="noreferrer noreferrer" target="_blank">darshan-users-bounces@lists.mcs.anl.gov</a>> on behalf of Hassan Asghar <<a href="mailto:haxxanasghar@gmail.com" rel="noreferrer noreferrer" target="_blank">haxxanasghar@gmail.com</a>><br>
<b>Sent:</b> Wednesday, June 16, 2021 3:25 AM<br>
<b>To:</b> <a href="mailto:darshan-users@lists.mcs.anl.gov" rel="noreferrer noreferrer" target="_blank">darshan-users@lists.mcs.anl.gov</a> <<a href="mailto:darshan-users@lists.mcs.anl.gov" rel="noreferrer noreferrer" target="_blank">darshan-users@lists.mcs.anl.gov</a>><br>
<b>Subject:</b> [Darshan-users] COMM Split Error</font>
<div> </div>
</div>
<div>
<div dir="ltr">
<div style="font-family:arial,helvetica,sans-serif;font-size:small">
I am facing the following issue: Please help</div>
<div style="font-family:arial,helvetica,sans-serif;font-size:small">
<br>
</div>
<div style="font-family:arial,helvetica,sans-serif;font-size:small">
[haxxanasghar@gpuserver2 ~]$ <b>mpiexec -n 16 -f machinefile -env LD_PRELOAD=/home/haxxanasghar/darshan/darshan-runtime/lib/libdarshan.so ./MADbench2 640 80 4 8 8 4 4</b><br>
<br>
MADbench 2.0 IO-mode<br>
no_pe = 16 no_pix = 640 no_bin = 80 no_gang = 4 sblocksize = 8 fblocksize = 8 r_mod = 4 w_mod = 4<br>
IOMETHOD = POSIX IOMODE = SYNC FILETYPE = UNIQUE REMAP = CUSTOM<br>
<br>
Fatal error in PMPI_Comm_split: A process has failed, error stack:<br>
PMPI_Comm_split(474)......: MPI_Comm_split(MPI_COMM_WORLD, color=0, key=2, new_comm=0x6073e0) failed<br>
PMPI_Comm_split(456)......:<br>
MPIR_Comm_split_impl(143).:<br>
MPIR_Allgather_impl(807)..:<br>
MPIR_Allgather(766).......:<br>
MPIR_Allgather_intra(181).:<br>
dequeue_and_set_error(888): Communication error with rank 3<br>
MPIR_Allgather_intra(181).:<br>
dequeue_and_set_error(888): Communication error with rank 6<br>
Fatal error in PMPI_Comm_split: A process has failed, error stack:<br>
PMPI_Comm_split(474)......: MPI_Comm_split(MPI_COMM_WORLD, color=3, key=1, new_comm=0x6073e0) failed<br>
PMPI_Comm_split(456)......:<br>
MPIR_Comm_split_impl(143).:<br>
MPIR_Allgather_impl(807)..:<br>
MPIR_Allgather(766).......:<br>
MPIR_Allgather_intra(181).:<br>
dequeue_and_set_error(888): Communication error with rank 12<br>
MPIR_Allgather_intra(181).:<br>
dequeue_and_set_error(888): Communication error with rank 15<br>
MPIR_Allgather_intra(181).:<br>
dequeue_and_set_error(888): Communication error with rank 9<br>
<br>
===================================================================================<br>
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES<br>
= EXIT CODE: 1<br>
= CLEANING UP REMAINING PROCESSES<br>
= YOU CAN IGNORE THE BELOW CLEANUP MESSAGES<br>
===================================================================================<br>
[mpiexec@gpuserver2] HYDU_sock_read (./utils/sock/sock.c:243): read error (Bad file descriptor)<br>
[mpiexec@gpuserver2] control_cb (./pm/pmiserv/pmiserv_cb.c:201): unable to read command from proxy<br>
[mpiexec@gpuserver2] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status<br>
[mpiexec@gpuserver2] HYD_pmci_wait_for_completion (./pm/pmiserv/pmiserv_pmci.c:197): error waiting for event<br>
[mpiexec@gpuserver2] main (./ui/mpich/mpiexec.c:331): process manager error waiting for completion<br>
[haxxanasghar@gpuserver2 ~]$<br>
</div>
<div>
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div style="font-size:small;font-family:arial,helvetica,sans-serif"></div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</blockquote></div>