[Darshan-users] Darshan trace file not created for WRF
Snyder, Shane
ssnyder at mcs.anl.gov
Tue Nov 26 15:47:51 CST 2024
Hi Wadud,
Yeah, I would guess MPI_Finalize isn't being called, if you're sure you don't see any Darshan warnings about not being able to shutdown properly. I guess you could try to put it in a debugger to confirm.
If we're sure the app isn't calling finalize, and if we don't have any ways to modify the source or otherwise get it to cleanly shut down, then there is another option. You could always force Darshan into non-MPI (i.e., single process) mode by setting DARSHAN_ENABLE_NONMPI=1 in your environment. That will cause it to generate independent logs for each MPI rank.
Having a log per process complicates analysis of larger MPI jobs though, especially since Darshan analysis tools currently take a single log as input ... You can always use the darshan-merge utility to try to combine the per-process logs back into one:
darshan-merge --shared-redux --output combined-log.darshan <input log file glob>
Thanks,
--Shane
________________________________
From: Wadud Miah <wadud.miah at ku.ac.ae>
Sent: Tuesday, November 26, 2024 3:33 AM
To: Snyder, Shane <ssnyder at mcs.anl.gov>; darshan-users at lists.mcs.anl.gov <darshan-users at lists.mcs.anl.gov>
Subject: RE: Darshan trace file not created for WRF
Hi Shane, I had a look at the output and error files and the only Darshan content that I can find are (shown in boldface): #darshan: <op> <nprocs> <time> starting wrf task 48 of 104 darshan: init 104 0. 045284 starting wrf task
ZjQcmQRYFpfptBannerStart
This Message Is From an External Sender
This message came from outside your organization.
ZjQcmQRYFpfptBannerEnd
Hi Shane,
I had a look at the output and error files and the only Darshan content that I can find are (shown in boldface):
#darshan:<op> <nprocs> <time>
starting wrf task 48 of 104
darshan:init 104 0.045284
starting wrf task 72 of 104
Seems to imply that MPI_FINALIZE() isn’t being called, but the WRF application seems to have finished normally. Is there anything else you can suggest?
Regards,
Dr. Wadud Miah
Scientific Computing Support Senior Specialist
Scientific Computing Support
[cid:image001.png at 01DB4007.B8686640]
wadud.miah at ku.ac.ae
T : +971 2 312 5531
ku.ac.ae<https://urldefense.us/v3/__http://www.ku.ac.ae/__;!!G_uCfscf7eWS!Y2DbZSuP-hwsB9KTjsS0JDZTxDMx5khhAYpXkA2sgvpmqZlc1kzmufFfrfd1GYEEGs8nrJVCvA9rkevSe8T1aDk$>
PO. Box 127788, Abu Dhabi, UAE
Khalifa University
[cid:image002.png at 01DB4007.B8686640]<https://urldefense.us/v3/__http://www.ku.ac.ae/social__;!!G_uCfscf7eWS!Y2DbZSuP-hwsB9KTjsS0JDZTxDMx5khhAYpXkA2sgvpmqZlc1kzmufFfrfd1GYEEGs8nrJVCvA9rkevS1yY3W6A$>
"Disclaimer: This email and any attachments are intended solely for the use of the recipient(s) and may contain confidential or legally privileged information. If you are not the intended recipient, please notify the sender immediately and delete this email. Any unauthorized review, use, or distribution is prohibited. The views expressed in this email are those of the sender and may not reflect the official policies or position of Khalifa University of Science and Technology."
From: Snyder, Shane <ssnyder at mcs.anl.gov>
Sent: 16 November 2024 01:03
To: Wadud Miah <wadud.miah at ku.ac.ae>; darshan-users at lists.mcs.anl.gov
Subject: Re: Darshan trace file not created for WRF
CAUTION: This email is from an external sender. Be cautious with links and attachments.
One thing you can check is setting DARSHAN_INTERNAL_TIMING=1 in your environment — this will make Darshan print out a little bit of internal library timing information at initialize/finalize time which could give us a better clue about how far Darshan is getting.
Just guessing, but is it possible the job isn't calling MPI_Finalize()? That's when Darshan generates the log file for MPI applications. Usually, I would expect some sort of warning message if the app does call MPI_Finalize() and Darshan doesn't succeed in generating the log file.
Using LD_PRELOAD is also usually pretty fail-safe in terms of ensuring Darshan properly intercepts what the app is doing. That said, is it possible the app is somehow statically linking the MPI library? You could run ldd <wrf_executable_path> and ensure that you see libmpi.so listed? That would confirm MPI is linked dynamically which would theoretically be easy to instrument using LD_PRELOAD the way you are.
--Shane
________________________________
From: Darshan-users <darshan-users-bounces at lists.mcs.anl.gov<mailto:darshan-users-bounces at lists.mcs.anl.gov>> on behalf of Wadud Miah <wadud.miah at ku.ac.ae<mailto:wadud.miah at ku.ac.ae>>
Sent: Thursday, November 14, 2024 11:34 PM
To: darshan-users at lists.mcs.anl.gov<mailto:darshan-users at lists.mcs.anl.gov> <darshan-users at lists.mcs.anl.gov<mailto:darshan-users at lists.mcs.anl.gov>>
Subject: [Darshan-users] Darshan trace file not created for WRF
Hi, I am trying to profile the I/O of WRF and have the following settings: export DARSHAN_LOGPATH=/home/miahw/WRF_WADUD_PRE-COMPILED LD_PRELOAD=/home/miahw/openmpi-4. 1-darshan-3. 4. 5/lib/libdarshan. so mpirun -n $SLURM_NTASKS /apps/ku/gcc-9_4/openmpi-4_1/wrf/4. 0/WRF/run/wrf. exe
ZjQcmQRYFpfptBannerStart
This Message Is From an External Sender
This message came from outside your organization.
ZjQcmQRYFpfptBannerEnd
Hi,
I am trying to profile the I/O of WRF and have the following settings:
export DARSHAN_LOGPATH=/home/miahw/WRF_WADUD_PRE-COMPILED
LD_PRELOAD=/home/miahw/openmpi-4.1-darshan-3.4.5/lib/libdarshan.so mpirun -n $SLURM_NTASKS /apps/ku/gcc-9_4/openmpi-4_1/wrf/4.0/WRF/run/wrf.exe
However, no trace file gets created and the job stdout and stderr does not contain any Darshan warning/error messages. Can anyone suggest a way to increase debug level to investigate this further?
Regards,
Dr. Wadud Miah
Scientific Computing Support Senior Specialist
Scientific Computing Support
[cid:image001.png at 01DB4007.B8686640]
wadud.miah at ku.ac.ae<mailto:wadud.miah at ku.ac.ae>
T : +971 2 312 5531
ku.ac.ae<https://urldefense.us/v3/__https://secureurl.ankabut.ac.ae/fmlurlsvc/?fewReq=:B:JVg8NzI*PCBwOzQoNiBvYjs2Nzw2NyB1b2FoZ3JzdGM7Nj4xMTMyNmBjPjUyZzYyZzNnYDBiPjE0NTQ*MzNjMDc3P2RnY2A2PyByOzcxNTcxNjIzPzQgd29iOzJHQEo1R28wNjQ*NDExKzJHQEo1R28*NjQ*NDExIHRldnI7cWdic2Ioa29nbkZtcyhnZShnYyBlOzM2IG5iajs2&url=https*3a*2f*2furldefense.us*2fv3*2f__http*3a*2f*2fwww.ku.ac.ae*2f__*3b*21*21G_uCfscf7eWS*21aYrOJexVEUI_aBsvJBX359JyrV6pDqJPV7BOfA-ksBygzTWbJSaXmwo9oLuAmmNwnmRmzbgv0pPylKnEHmW-3ITcXISP6DeL*24__;LysrLyslJSUlJSUlJSUlJSUlJQ!!G_uCfscf7eWS!Y2DbZSuP-hwsB9KTjsS0JDZTxDMx5khhAYpXkA2sgvpmqZlc1kzmufFfrfd1GYEEGs8nrJVCvA9rkevS1z9xuzw$>
PO. Box 127788, Abu Dhabi, UAE
Khalifa University
[cid:image002.png at 01DB4007.B8686640]<https://urldefense.us/v3/__https://secureurl.ankabut.ac.ae/fmlurlsvc/?fewReq=:B:JVg8NzI*PCBwOzQoNiBvYjs2Nzw2NyB1b2FoZ3JzdGM7YDcyZzFkMz8*MDc*ZDY0Mj9nNz5lPzRiZTc*MWM1ZGVlZT4xZWRiYyByOzcxNTcxNjIzPzQgd29iOzJHQEo1R28wNjQ*NDExKzJHQEo1R28*NjQ*NDExIHRldnI7cWdic2Ioa29nbkZtcyhnZShnYyBlOzM2IG5iajs2&url=https*3a*2f*2furldefense.us*2fv3*2f__http*3a*2f*2fwww.ku.ac.ae*2fsocial__*3b*21*21G_uCfscf7eWS*21aYrOJexVEUI_aBsvJBX359JyrV6pDqJPV7BOfA-ksBygzTWbJSaXmwo9oLuAmmNwnmRmzbgv0pPylKnEHmW-3ITcXJTLlL2k*24__;LysrKysvKyUlJSUlJSUlJSUlJSUl!!G_uCfscf7eWS!Y2DbZSuP-hwsB9KTjsS0JDZTxDMx5khhAYpXkA2sgvpmqZlc1kzmufFfrfd1GYEEGs8nrJVCvA9rkevSclD-nUY$>
"Disclaimer: This email and any attachments are intended solely for the use of the recipient(s) and may contain confidential or legally privileged information. If you are not the intended recipient, please notify the sender immediately and delete this email. Any unauthorized review, use, or distribution is prohibited. The views expressed in this email are those of the sender and may not reflect the official policies or position of Khalifa University of Science and Technology."
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/darshan-users/attachments/20241126/6c664193/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.png
Type: image/png
Size: 7784 bytes
Desc: image001.png
URL: <http://lists.mcs.anl.gov/pipermail/darshan-users/attachments/20241126/6c664193/attachment-0002.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image002.png
Type: image/png
Size: 3851 bytes
Desc: image002.png
URL: <http://lists.mcs.anl.gov/pipermail/darshan-users/attachments/20241126/6c664193/attachment-0003.png>
More information about the Darshan-users
mailing list