[Darshan-users] darshan 3.3.0 issues
Thomas Breuer
t.breuer at fz-juelich.de
Wed May 19 10:33:20 CDT 2021
Hi Kevin,
thanks for the quick reply!
1. the IntelMPI version I have mentioned is based on MPICH 3.3. I have
attached the log (iimpi_error_log.txt).
2. I have attached the code as well (which just writes a couple of lines
to stdout) (hello_world.c).
Compilation command: /mpicxx -fopenmp hello_world.c -o hello_world.exe
/Hint: With OpenMPI/4.1.0rc2 the configuration with APMPI and execution
works. The PDF report seems to be properly created.
3. If I interpret that correctly the data collected by APMPI are not
shown yet in the PDF report?/
/FYI: A couple of years ago I have written a python script that extracts
the data from the binary log file with darshan-parser to get the raw
data which you use to create the PDF report. I was able to reproduce the
statistics shown in the PDF and have added a few more tables which
helped us to get a deeper understanding of applications IO at that time.
Since I have not touched this script for a long time it might not work
anymore. That's why it also interest for me to have a look at what
pydarshan is offering.
Thomas/
/
Am 19.05.2021 um 16:46 schrieb Harms, Kevin:
> Thomas,
>
> 1. Not sure why the Intel MPI is tripping up on the configure check. I'm assuming it is MPI3 based. Can you send us the config.log output from that one? Maybe we can see why the check fails.
>
> 2. The partial log indicates the log is incorrect, so those parser errors are expected. I don't know why the finalize hangs. Was this a Fortran hello world example? I'm not familiar with ParaStationMPI but since it is based on MPICH, it should work. Can you send the test code and how you built it? We can try it on a system here.
>
> 3. Autoperf can't be disabled at runtime yet. We have a broader plan to add the ability to enable/disable modules during runtime, but not available yet. We have tested AutoPerf with CrayMPI, MPICH3.3 and OpenMPI. The systems we tested on were generic Linux laptop, Cray XC-40 and Nvidia DGX A100. As far as what can be done with APMPI data, we have some python analysis script based on pydarshan.
>
> https://xgitlab.cels.anl.gov/AutoPerf/autoperf/-/blob/master/apmpi/util/apmpi-analysis.py
>
> The counters are also output by darshan-parser. We are still in the process of building more analysis based on this work.
>
> kevin
>
> ________________________________________
> From: Darshan-users <darshan-users-bounces at lists.mcs.anl.gov> on behalf of Thomas Breuer <t.breuer at fz-juelich.de>
> Sent: Wednesday, May 19, 2021 6:52 AM
> To: darshan-users at lists.mcs.anl.gov
> Subject: [Darshan-users] darshan 3.3.0 issues
>
> Dear Darshan Team,
>
> I have installed the latest darshan version (3.3.0) for different MPIs on our HPC JUWELS (https://apps.fz-juelich.de/jsc/hps/juwels/configuration.html) and would like to report two issues:
>
> 1. Intel (19.1.3.304) Compiler with IntelMPI/2019.8.254:
> - Configure Step fails for the new APMPI feature:
> cd darshan-runtime; ./configure --prefix=/path/to/darshan-runtime/3.3.0-iimpi-2020-APMPI --with-mem-align=8 --with-log-path-by-env=DARSHAN_LOG_P
> ATH --with-jobid-env=SLURM_JOBID CC=mpicc --enable-hdf5-mod=$EBROOTHDF5 --enable-apmpi-mod --enable-apmpi-coll-sync
> - Error msg: configure: error: APMPI module requires MPI version 3+
> - without the new APMPI Options the configure steps ends successfully:
> cd darshan-runtime; ./configure --prefix=/p/software/juwels/stages/Devel-2020/software/darshan-runtime/3.3.0-iimpi-2020 --with-mem-align=8 --with-log-path-by-env=DARSHAN_LOG_PATH --with-jobid-env=SLURM_JOBID CC=mpicc --enable-hdf5-mod=$EBROOTHDF5
>
>
> 2. GCC/9.3.0 Compiler with ParaStationMPI/5.4.7-1 (based on MPICH 3.3.2) (https://github.com/ParaStation/psmpi/):
> - darshan-runtime configured with --enable-apmpi-mod --enable-apmpi-coll-sync
> - For a simple helloworld code (MPI + OMP) the application seems to be hanging in the MPI_FINALIZE call.
> - if I open the *.darshan_partial file with `darshan-parser`, then the following output is printed:
> Error: incompatible darshan file.
> Error: expected version 3.21, but got
> Error: failed to read darshan log file header.
> - There are no issues without APMPI.
>
> 3. Further questions:
> - Is it possible to switch on/off APMPI during runtime?
> - Are there any examples available that demonstrate the additional value that can be achieved by using the new AutoPerf feature?
> - Can you confirm that APMPI works on none-Cray systems ?
>
> Best regards,
> Thomas
>
> --
> Thomas Breuer
>
> Division Application Support Forschungszentrum Jülich GmbH
> Jülich Supercomputing Centre (JSC) Wilhelm-Johnen-Straße
> http://www.fz-juelich.de/ias/jsc 52425 Jülich (Germany)
> Phone: +49 2461 61-96742 (currently not available via phone)
> Email: t.breuer at fz-juelich.de<mailto:t.breuer at fz-juelich.de>
>
> -------------------------------------------------------------------------------------
> -------------------------------------------------------------------------------------
> Forschungszentrum Juelich GmbH
> 52425 Juelich
> Sitz der Gesellschaft: Juelich
> Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498
> Vorsitzender des Aufsichtsrats: MinDir Volker Rieke
> Geschaeftsfuehrung: Prof. Dr.-Ing. Wolfgang Marquardt (Vorsitzender),
> Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt, Prof. Dr. Frauke Melchior
> -------------------------------------------------------------------------------------
> -------------------------------------------------------------------------------------
--
Thomas Breuer
Division Application Support Forschungszentrum Jülich GmbH
Jülich Supercomputing Centre (JSC) Wilhelm-Johnen-Straße
http://www.fz-juelich.de/ias/jsc 52425 Jülich (Germany)
Phone: +49 2461 61-96742 (currently not available via phone)
Email: t.breuer at fz-juelich.de
-------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------
Forschungszentrum Juelich GmbH
52425 Juelich
Sitz der Gesellschaft: Juelich
Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498
Vorsitzender des Aufsichtsrats: MinDir Volker Rieke
Geschaeftsfuehrung: Prof. Dr.-Ing. Wolfgang Marquardt (Vorsitzender),
Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt, Prof. Dr. Frauke Melchior
-------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/darshan-users/attachments/20210519/f6bb64ed/attachment-0001.html>
-------------- next part --------------
#include <stdio.h>
#include "mpi.h"
#include <omp.h>
#include <ctime>
#include <sys/time.h>
int main(int argc, char *argv[]) {
struct timeval begin, total;
gettimeofday(&begin, NULL);
printf("begin time = %f s\n", (begin.tv_sec+begin.tv_usec/1000000.0));
int numprocs, rank, namelen;
char processor_name[MPI_MAX_PROCESSOR_NAME];
int thread = 0, np = 1;
struct timeval initStart, initEnd;
gettimeofday(&initStart, NULL);
MPI_Init(&argc, &argv);
gettimeofday(&initEnd, NULL);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
printf("rank=%d,initStart = %f s, initEnd = %f s,init time = %f s\n", rank, (initStart.tv_sec+initStart.tv_usec/1000000.0),
(initEnd.tv_sec+initEnd.tv_usec/1000000.0),((initEnd.tv_sec-initStart.tv_sec)+(initEnd.tv_usec-initStart.tv_usec)/1000000.0));
struct timeval afterInit, beforeFinal;
gettimeofday(&afterInit, NULL);
MPI_Comm_size(MPI_COMM_WORLD, &numprocs);
MPI_Get_processor_name(processor_name, &namelen);
#pragma omp parallel default(shared) private(thread, np)
{
np = omp_get_num_threads();
thread = omp_get_thread_num();
//printf("Hello from thread %d out of %d from process %d out of %d on %s\n",
// thread, np, rank, numprocs, processor_name);
}
gettimeofday(&beforeFinal, NULL);
printf("rankInternal = %d, afterInit = %f s, beforeFinal = %f s, internalTime = %f s\n", rank, (afterInit.tv_sec+afterInit.tv_usec/1000000.0),
(beforeFinal.tv_sec+beforeFinal.tv_usec/1000000.0),((beforeFinal.tv_sec-afterInit.tv_sec)+(beforeFinal.tv_usec-afterInit.tv_usec)/1000000.0));
MPI_Finalize();
gettimeofday(&total, NULL);
printf("rankEnd = %d, endtime = %f , total time = %f s\n", rank, (total.tv_sec+total.tv_usec/1000000.0), ((total.tv_sec-begin.tv_sec)+(total.tv_usec-begin.tv_usec)/1000000.0));
}
-------------- next part --------------
> cd darshan-runtime
> ./configure --prefix=/path/to/darshan-runtime/3.3.0-iimpi-2020-APMPI --with-mem-align=8 --with-log-path-by-env=DARSHAN_LOG_PATH --with-jobid-env=SLURM_JOBID CC=mpicc --enable-hdf5-mod=$EBROOTHDF5 --enable-apmpi-mod --enable-apmpi-coll-sync
checking for a BSD-compatible install... /usr/bin/install -c
checking whether to compile using MPI... yes
checking for gcc... mpicc
checking whether the C compiler works... yes
checking for C compiler default output file name... a.out
checking for suffix of executables...
checking whether we are cross compiling... no
checking for suffix of object files... o
checking whether we are using the GNU C compiler... yes
checking whether mpicc accepts -g... yes
checking for mpicc option to accept ISO C89... none needed
checking for function MPI_Init... yes
checking for mpi.h... yes
checking how to run the C preprocessor... mpicc -E
checking for grep that handles long lines and -e... /usr/bin/grep
checking for egrep... /usr/bin/grep -E
checking for ANSI C header files... yes
checking for sys/types.h... yes
checking for sys/stat.h... yes
checking for stdlib.h... yes
checking for string.h... yes
checking for memory.h... yes
checking for strings.h... yes
checking for inttypes.h... yes
checking for stdint.h... yes
checking for unistd.h... yes
checking zlib.h usability... yes
checking zlib.h presence... yes
checking for zlib.h... yes
checking for inflateEnd in -lz... yes
checking for /path/to/darshan-3.3.0/darshan-runtime/../modules/autoperf/apmpi/darshan-apmpi-log-format.h... yes
checking for h5pcc... yes
checking for BG/Q environment... no
checking lustre/lustreapi.h usability... no
checking lustre/lustreapi.h presence... no
checking for lustre/lustreapi.h... no
checking for inttypes.h... (cached) yes
checking whether the inttypes.h PRIxNN macros are broken... no
checking for inttypes.h... (cached) yes
checking whether byte ordering is bigendian... no
checking for struct aiocb64... yes
checking for off64_t... yes
checking mntent.h usability... yes
checking mntent.h presence... yes
checking for mntent.h... yes
checking sys/mount.h usability... yes
checking sys/mount.h presence... yes
checking for sys/mount.h... yes
checking for MPI-IO support in MPI... yes
checking for MPI prototypes without const qualifier... yes
configure: error: APMPI module requires MPI version 3+
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 5322 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://lists.mcs.anl.gov/pipermail/darshan-users/attachments/20210519/f6bb64ed/attachment-0001.p7s>
More information about the Darshan-users
mailing list