[Darshan-users] Darshan trace file not created for WRF

Wadud Miah wadud.miah at ku.ac.ae
Tue Nov 26 01:30:19 CST 2024


Hi Shane,

Thanks for the useful information. The binary is indeed dynamically linked to libmpi.so:

$ ldd /apps/ku/gcc-9_4/openmpi-4_1/wrf/4.0/WRF/run/wrf.exe | grep libmpi
        libmpi_usempif08.so.40 => /apps/ku/gcc-9_4/openmpi/4.1/lib/libmpi_usempif08.so.40 (0x00007fc091c85000)
        libmpi_usempi_ignore_tkr.so.40 => /apps/ku/gcc-9_4/openmpi/4.1/lib/libmpi_usempi_ignore_tkr.so.40 (0x00007fc091a7a000)
        libmpi_mpifh.so.40 => /apps/ku/gcc-9_4/openmpi/4.1/lib/libmpi_mpifh.so.40 (0x00007fc09180e000)
        libmpi.so.40 => /apps/ku/gcc-9_4/openmpi/4.1/lib/libmpi.so.40 (0x00007fc0914e2000)
        libmpi_cxx.so.40 => /apps/ku/gcc-9_4/openmpi/4.1/lib/libmpi_cxx.so.40 (0x00007fc090881000)

The application is definitely calling MPI_FINALIZE() as I do not see any runtime error messages from Open MPI. If any of the processes failed to call it, then Open MPI would throw an error message which I am not seeing. I will try DARSHAN_INTERNAL_TIMING=1 and will let you know the outcome of it.

Regards,
Wadud.



Dr. Wadud Miah
Scientific Computing Support Senior Specialist
Scientific Computing Support





[cid:image001.png at 01DB3FF5.679113A0]



    wadud.miah at ku.ac.ae
    T : +971 2 312 5531
    ku.ac.ae<https://urldefense.us/v3/__http://www.ku.ac.ae/__;!!G_uCfscf7eWS!diyrHViSKjG2i37SO_ouMHo-nL9Pi0pYGYHC2VvP4SFxAHAcO8ah3YoSc_qkrSruoGI6G4J2rH8tzx7bLJnrtFZJ-VZqXPjl$ >

PO. Box 127788, Abu Dhabi, UAE
Khalifa University
[cid:image002.png at 01DB3FF5.679113A0]<https://urldefense.us/v3/__http://www.ku.ac.ae/social__;!!G_uCfscf7eWS!diyrHViSKjG2i37SO_ouMHo-nL9Pi0pYGYHC2VvP4SFxAHAcO8ah3YoSc_qkrSruoGI6G4J2rH8tzx7bLJnrtFZJ-X9HP7sT$ >



"Disclaimer: This email and any attachments are intended solely for the use of the recipient(s) and may contain confidential or legally privileged information. If you are not the intended recipient, please notify the sender immediately and delete this email. Any unauthorized review, use, or distribution is prohibited. The views expressed in this email are those of the sender and may not reflect the official policies or position of Khalifa University of Science and Technology."

From: Snyder, Shane <ssnyder at mcs.anl.gov>
Sent: 16 November 2024 01:03
To: Wadud Miah <wadud.miah at ku.ac.ae>; darshan-users at lists.mcs.anl.gov
Subject: Re: Darshan trace file not created for WRF

CAUTION: This email is from an external sender. Be cautious with links and attachments.

One thing you can check is setting DARSHAN_INTERNAL_TIMING=1 in your environment — this will make Darshan print out a little bit of internal library timing information at initialize/finalize time which could give us a better clue about how far Darshan is getting.

Just guessing, but is it possible the job isn't calling MPI_Finalize()? That's when Darshan generates the log file for MPI applications. Usually, I would expect some sort of warning message if the app does call MPI_Finalize() and Darshan doesn't succeed in generating the log file.

Using LD_PRELOAD is also usually pretty fail-safe in terms of ensuring Darshan properly intercepts what the app is doing. That said, is it possible the app is somehow statically linking the MPI library? You could run ldd <wrf_executable_path> and ensure that you see libmpi.so listed? That would confirm MPI is linked dynamically which would theoretically be easy to instrument using LD_PRELOAD the way you are.

--Shane
________________________________
From: Darshan-users <darshan-users-bounces at lists.mcs.anl.gov<mailto:darshan-users-bounces at lists.mcs.anl.gov>> on behalf of Wadud Miah <wadud.miah at ku.ac.ae<mailto:wadud.miah at ku.ac.ae>>
Sent: Thursday, November 14, 2024 11:34 PM
To: darshan-users at lists.mcs.anl.gov<mailto:darshan-users at lists.mcs.anl.gov> <darshan-users at lists.mcs.anl.gov<mailto:darshan-users at lists.mcs.anl.gov>>
Subject: [Darshan-users] Darshan trace file not created for WRF

Hi, I am trying to profile the I/O of WRF and have the following settings: export DARSHAN_LOGPATH=/home/miahw/WRF_WADUD_PRE-COMPILED LD_PRELOAD=/home/miahw/openmpi-4. 1-darshan-3. 4. 5/lib/libdarshan. so mpirun -n $SLURM_NTASKS /apps/ku/gcc-9_4/openmpi-4_1/wrf/4. 0/WRF/run/wrf. exe
ZjQcmQRYFpfptBannerStart
This Message Is From an External Sender
This message came from outside your organization.

ZjQcmQRYFpfptBannerEnd

Hi,


I am trying to profile the I/O of WRF and have the following settings:


export DARSHAN_LOGPATH=/home/miahw/WRF_WADUD_PRE-COMPILED

LD_PRELOAD=/home/miahw/openmpi-4.1-darshan-3.4.5/lib/libdarshan.so mpirun -n $SLURM_NTASKS /apps/ku/gcc-9_4/openmpi-4_1/wrf/4.0/WRF/run/wrf.exe


However, no trace file gets created and the job stdout and stderr does not contain any Darshan warning/error messages. Can anyone suggest a way to increase debug level to investigate this further?


Regards,




Dr. Wadud Miah
Scientific Computing Support Senior Specialist
Scientific Computing Support





[cid:image001.png at 01DB3FF5.679113A0]



    wadud.miah at ku.ac.ae<mailto:wadud.miah at ku.ac.ae>
    T : +971 2 312 5531
    ku.ac.ae<https://urldefense.us/v3/__https://secureurl.ankabut.ac.ae/fmlurlsvc/?fewReq=:B:JVg8NzI*PCBwOzQoNiBvYjs2Nzw2NyB1b2FoZ3JzdGM7Nj4xMTMyNmBjPjUyZzYyZzNnYDBiPjE0NTQ*MzNjMDc3P2RnY2A2PyByOzcxNTcxNjIzPzQgd29iOzJHQEo1R28wNjQ*NDExKzJHQEo1R28*NjQ*NDExIHRldnI7cWdic2Ioa29nbkZtcyhnZShnYyBlOzM2IG5iajs2&url=https*3a*2f*2furldefense.us*2fv3*2f__http*3a*2f*2fwww.ku.ac.ae*2f__*3b*21*21G_uCfscf7eWS*21aYrOJexVEUI_aBsvJBX359JyrV6pDqJPV7BOfA-ksBygzTWbJSaXmwo9oLuAmmNwnmRmzbgv0pPylKnEHmW-3ITcXISP6DeL*24__;LysrLyslJSUlJSUlJSUlJSUlJQ!!G_uCfscf7eWS!diyrHViSKjG2i37SO_ouMHo-nL9Pi0pYGYHC2VvP4SFxAHAcO8ah3YoSc_qkrSruoGI6G4J2rH8tzx7bLJnrtFZJ-ZF4u6je$ >

PO. Box 127788, Abu Dhabi, UAE
Khalifa University
[cid:image002.png at 01DB3FF5.679113A0]<https://urldefense.us/v3/__https://secureurl.ankabut.ac.ae/fmlurlsvc/?fewReq=:B:JVg8NzI*PCBwOzQoNiBvYjs2Nzw2NyB1b2FoZ3JzdGM7YDcyZzFkMz8*MDc*ZDY0Mj9nNz5lPzRiZTc*MWM1ZGVlZT4xZWRiYyByOzcxNTcxNjIzPzQgd29iOzJHQEo1R28wNjQ*NDExKzJHQEo1R28*NjQ*NDExIHRldnI7cWdic2Ioa29nbkZtcyhnZShnYyBlOzM2IG5iajs2&url=https*3a*2f*2furldefense.us*2fv3*2f__http*3a*2f*2fwww.ku.ac.ae*2fsocial__*3b*21*21G_uCfscf7eWS*21aYrOJexVEUI_aBsvJBX359JyrV6pDqJPV7BOfA-ksBygzTWbJSaXmwo9oLuAmmNwnmRmzbgv0pPylKnEHmW-3ITcXJTLlL2k*24__;LysrKysvKyUlJSUlJSUlJSUlJSUl!!G_uCfscf7eWS!diyrHViSKjG2i37SO_ouMHo-nL9Pi0pYGYHC2VvP4SFxAHAcO8ah3YoSc_qkrSruoGI6G4J2rH8tzx7bLJnrtFZJ-eUzwhjR$ >


"Disclaimer: This email and any attachments are intended solely for the use of the recipient(s) and may contain confidential or legally privileged information. If you are not the intended recipient, please notify the sender immediately and delete this email. Any unauthorized review, use, or distribution is prohibited. The views expressed in this email are those of the sender and may not reflect the official policies or position of Khalifa University of Science and Technology."

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/darshan-users/attachments/20241126/30c6193d/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.png
Type: image/png
Size: 7784 bytes
Desc: image001.png
URL: <http://lists.mcs.anl.gov/pipermail/darshan-users/attachments/20241126/30c6193d/attachment-0002.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image002.png
Type: image/png
Size: 3851 bytes
Desc: image002.png
URL: <http://lists.mcs.anl.gov/pipermail/darshan-users/attachments/20241126/30c6193d/attachment-0003.png>


More information about the Darshan-users mailing list