[Darshan-users] About HPE MPI (MPT) support in Darshan (MPT ERROR: PMI2_Init)

pramod kumbhar pramod.s.kumbhar at gmail.com
Sat Aug 10 05:08:33 CDT 2019


I am quite late here but just to update this old thread:

The issue seems to be related to TaskProlog in the slurm.conf when HPE-MPI
is used (even if the task prolog script is empty). For now, I workaround
the issue by using native mpirun launcher from HPE-MPI (and avoiding all
SLURM env variables).

-Pramod


On Wed, Jul 25, 2018 at 8:58 PM Carns, Philip H. <carns at mcs.anl.gov> wrote:

> That's interesting that the segfault comes from the prolog rather than
> your MPI executable.
>
> Is there a way in Slurm to pass an LD_PRELOAD environment variable that
> *only* affects the user executable and not the prolog?
>
> I'm not sure what's in the prolog, but it appears to be something that is
> incompatible with a Darshan library that's been built against HPE MPI.  If
> there is an MPI program in there that would make sense; maybe there is a
> binary incompatibility.
>
> thanks,
> -Phil
>
>
>
> On 2018-07-21 05:25:08-04:00 Darshan-users wrote:
>
> Dear Kevin,
> Sorry for delay in response. I looked into docs but couldn't find anything
> specific. I tried:
> $ srun -n 2
> --export=LD_PRELOAD=/some_path/darshan-runtime-3.1.6-ryds66/lib/libdarshan.so,LD_LIBRARY_PATH=/opt/hpe/hpc/mpt/mpt-2.16/lib
> ./hello
> MPT ERROR: PMI2_Init
> MPT ERROR: PMI2_Init
> srun: error: r2i0n34: task 1: Exited with exit code 255
>  And core files doesn't say anything useful, its generated by slurm prolog:
> warning: core file may not match specified executable file.
> [New LWP 72075]
> [New LWP 72080]
> Core was generated by `/bin/bash /etc/slurm/slurm.taskprolog'.
> Program terminated with signal 11, Segmentation fault.
> #0  0x00002aaaac1cc1e1 in ?? ()
> (gdb) bt
> Python Exception <class 'gdb.MemoryError'> Cannot access memory at address
> 0x7fffffffddf8:
> If I try to do :
> $ LD_PRELOAD=/gpfs/
> bbp.cscs.ch/data/project/proj16/kumbhar/soft/MPI_COMPARE/HPE_MPI/install/linux-rhel7-x86_64/intel-18.0.1/darshan-runtime-3.1.6-ryds66/lib/libdarshan.so
> ls
> MPT ERROR: PMI2_Init
> Let me know if you have any suggestion to debug this further.
> Regards,
> Pramod
>
>
> On Thu, Jul 5, 2018 at 5:54 PM, Harms, Kevin <harms at alcf.anl.gov> wrote:
>>
>> Pramod,
>>
>>   are there ayn environment variables that can be set to print out what
>> the error code is? Did you build libdarshan using the HP MPI?
>>
>> kevin
>>
>> ________________________________________
>> From: Darshan-users <darshan-users-bounces at lists.mcs.anl.gov> on behalf
>> of pramod kumbhar <pramod.s.kumbhar at gmail.com>
>> Sent: Wednesday, July 4, 2018 7:54:21 AM
>> To: darshan-users at lists.mcs.anl.gov
>> Subject: [Darshan-users] About HPE MPI (MPT) support in Darshan (MPT
>> ERROR: PMI2_Init)
>>
>> Dear All,
>>
>> I was trying to use Darshan (3.1.6) with HPE MPI (MPT)<
>> https://support.hpe.com/hpsc/doc/public/display?docId=emr_na-a00037728en_us&docLocale=en_US>
>> on our system and seeing below error:
>>
>> LD_PRELOAD=/gpfs/some_path/lib/libdarshan.so  srun -n 1
>> /gpfs/some_another_path/bin/ior -a MPIIO -b 1G -t 4M  -c  -i 3
>> MPT ERROR: PMI2_Init
>>
>> With other MPI (e.g. intel-mpi), everything is working fine.
>>
>> Do you have any suggestions / workaround? Please let me know and I could
>> help to debug/test the issue.
>>
>> Regards,
>> Pramod
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/darshan-users/attachments/20190810/604d3c23/attachment.html>


More information about the Darshan-users mailing list