[Darshan-users] About HPE MPI (MPT) support in Darshan (MPT ERROR: PMI2_Init)

Carns, Philip H. carns at mcs.anl.gov
Mon Aug 12 07:41:52 CDT 2019


That's good to know- thanks for following up!

thanks,
-Phil
________________________________
From: pramod kumbhar <pramod.s.kumbhar at gmail.com>
Sent: Saturday, August 10, 2019 6:08 AM
To: Carns, Philip H. <carns at mcs.anl.gov>
Cc: Harms, Kevin <harms at alcf.anl.gov>; darshan-users at lists.mcs.anl.gov <darshan-users at lists.mcs.anl.gov>
Subject: Re: [Darshan-users] About HPE MPI (MPT) support in Darshan (MPT ERROR: PMI2_Init)

I am quite late here but just to update this old thread:

The issue seems to be related to TaskProlog in the slurm.conf when HPE-MPI is used (even if the task prolog script is empty). For now, I workaround the issue by using native mpirun launcher from HPE-MPI (and avoiding all SLURM env variables).

-Pramod

On Wed, Jul 25, 2018 at 8:58 PM Carns, Philip H. <carns at mcs.anl.gov<mailto:carns at mcs.anl.gov>> wrote:
That's interesting that the segfault comes from the prolog rather than your MPI executable.

Is there a way in Slurm to pass an LD_PRELOAD environment variable that *only* affects the user executable and not the prolog?

I'm not sure what's in the prolog, but it appears to be something that is incompatible with a Darshan library that's been built against HPE MPI.  If there is an MPI program in there that would make sense; maybe there is a binary incompatibility.

thanks,
-Phil



On 2018-07-21 05:25:08-04:00 Darshan-users wrote:

Dear Kevin,
Sorry for delay in response. I looked into docs but couldn't find anything specific. I tried:
$ srun -n 2 --export=LD_PRELOAD=/some_path/darshan-runtime-3.1.6-ryds66/lib/libdarshan.so,LD_LIBRARY_PATH=/opt/hpe/hpc/mpt/mpt-2.16/lib ./hello
MPT ERROR: PMI2_Init
MPT ERROR: PMI2_Init
srun: error: r2i0n34: task 1: Exited with exit code 255
 And core files doesn't say anything useful, its generated by slurm prolog:
warning: core file may not match specified executable file.
[New LWP 72075]
[New LWP 72080]
Core was generated by `/bin/bash /etc/slurm/slurm.taskprolog'.
Program terminated with signal 11, Segmentation fault.
#0  0x00002aaaac1cc1e1 in ?? ()
(gdb) bt
Python Exception <class 'gdb.MemoryError'> Cannot access memory at address 0x7fffffffddf8:
If I try to do :
$ LD_PRELOAD=/gpfs/bbp.cscs.ch/data/project/proj16/kumbhar/soft/MPI_COMPARE/HPE_MPI/install/linux-rhel7-x86_64/intel-18.0.1/darshan-runtime-3.1.6-ryds66/lib/libdarshan.so<http://bbp.cscs.ch/data/project/proj16/kumbhar/soft/MPI_COMPARE/HPE_MPI/install/linux-rhel7-x86_64/intel-18.0.1/darshan-runtime-3.1.6-ryds66/lib/libdarshan.so> ls
MPT ERROR: PMI2_Init
Let me know if you have any suggestion to debug this further.
Regards,
Pramod


On Thu, Jul 5, 2018 at 5:54 PM, Harms, Kevin <harms at alcf.anl.gov<mailto:harms at alcf.anl.gov>> wrote:
Pramod,

  are there ayn environment variables that can be set to print out what the error code is? Did you build libdarshan using the HP MPI?

kevin

________________________________________
From: Darshan-users <darshan-users-bounces at lists.mcs.anl.gov<mailto:darshan-users-bounces at lists.mcs.anl.gov>> on behalf of pramod kumbhar <pramod.s.kumbhar at gmail.com<mailto:pramod.s.kumbhar at gmail.com>>
Sent: Wednesday, July 4, 2018 7:54:21 AM
To: darshan-users at lists.mcs.anl.gov<mailto:darshan-users at lists.mcs.anl.gov>
Subject: [Darshan-users] About HPE MPI (MPT) support in Darshan (MPT ERROR: PMI2_Init)

Dear All,

I was trying to use Darshan (3.1.6) with HPE MPI (MPT)<https://support.hpe.com/hpsc/doc/public/display?docId=emr_na-a00037728en_us&docLocale=en_US> on our system and seeing below error:

LD_PRELOAD=/gpfs/some_path/lib/libdarshan.so  srun -n 1 /gpfs/some_another_path/bin/ior -a MPIIO -b 1G -t 4M  -c  -i 3
MPT ERROR: PMI2_Init

With other MPI (e.g. intel-mpi), everything is working fine.

Do you have any suggestions / workaround? Please let me know and I could help to debug/test the issue.

Regards,
Pramod

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/darshan-users/attachments/20190812/3ad87b57/attachment.html>


More information about the Darshan-users mailing list