[Darshan-users] darshan3.1.5 issue on Cray XC40 cle6.up05
Phil Carns
carns at mcs.anl.gov
Mon Feb 26 09:30:49 CST 2018
Hi Wadud,
Usually the Darshan wrappers are perfectly safe for non-MPI programs (or
programs that use MPI only indirectly through other libraries). We
might need to refine some changes that we made in the linker options in
3.1.5 to make sure this is still true.
-Phil
On 02/26/2018 10:23 AM, Wadud Miah wrote:
>
> Hi,
>
> It doesn’t look like the code is doing any MPI, and only using the
> compiler command (not MPI compilation wrappers), hence the errors
> about undefined MPI symbols. There’s nothing conclusive that this is a
> Darshan issue.
>
> Wadud.
>
> *From:*Darshan-users [mailto:darshan-users-bounces at lists.mcs.anl.gov]
> *On Behalf Of *Phil Carns
> *Sent:* 26 February 2018 15:18
> *To:* darshan-users at lists.mcs.anl.gov
> *Subject:* Re: [Darshan-users] darshan3.1.5 issue on Cray XC40 cle6.up05
>
> Hi Bilel,
>
> Thanks for the bug report. We can try to reproduce here and confirm.
>
> In the mean time, can you tell me if the test_scalapack.f90 code is
> using MPI or not?
>
> thanks,
> -Phil
>
> On 02/24/2018 02:25 PM, Bilel Hadri wrote:
>
> Dear Darshan colleagues,
>
> I recently installed darshan 3.1.5 on Shaheen, Cray XC40, we
> recently upgraded the OS to CLE6.up05 and using the 17.12 PrgEnv.
>
> Compiling Scalapack with Cray Libsci fails with the error shown
> below with all programming environment. Similar issue was observed
> for other codes, like a simple petsc code.
>
> After digging, it seems that it is related to darshan3.1.5 version
> recently installed on Shaheen. When unloading darshan, the
> compilation works fine with no issue.
>
> The error is not appearing when using darshan 3.1.4.
>
> ftn -o exe test_scalapack.f90
>
> /opt/cray/dmapp/default/lib64/libdmapp.a(dmapp_internal.o): In
> function `_dmappi_is_pure_dmapp_job':
>
> /home/abuild/rpmbuild/BUILD/cray-dmapp-7.1.1/src/dmapp_internal.c:1401:
> undefined reference to `__wrap_MPI_Init'
>
> /opt/cray/pe/libsci/17.12.1/CRAY/8.6/x86_64/lib/libsci_cray_mpi_mp.a(blacs_exit_.o):
> In function `blacs_exit_':
>
> /b/worker/csml-libsci-sles/build/mp/scalapack/BLACS/SRC/blacs_exit_.c:42:
> undefined reference to `__wrap_MPI_Finalize'
>
> /opt/cray/pe/libsci/17.12.1/CRAY/8.6/x86_64/lib/libsci_cray_mpi_mp.a(blacs_pinfo_.o):
> In function `blacs_pinfo_':
>
> /b/worker/csml-libsci-sles/build/mp/scalapack/BLACS/SRC/blacs_pinfo_.c:18:
> undefined reference to `__wrap_MPI_Init'
>
> /opt/cray/pe/libsci/17.12.1/CRAY/8.6/x86_64/lib/libsci_cray_mpi_mp.a(blacs_pinfo_.oo):
> In function `Cblacs_pinfo':
>
> /b/worker/csml-libsci-sles/build/mp/scalapack/BLACS/SRC/blacs_pinfo_.c:18:
> undefined reference to `__wrap_MPI_Init'
>
> /opt/cray/pe/cce/8.6.5/binutils/x86_64/x86_64-pc-linux-gnu/bin/ld:
> link errors found, deleting executable `exe'
>
> /usr/bin/sha1sum: exe: No such file or directory
>
> hadrib at cdl1:~> ll /usr/bin/sha1
>
> sha1pass sha1sum
>
> hadrib at cdl1:~> ll /usr/bin/sha1sum
>
> -rwxr-xr-x 1 root root 43912 Aug 6 2016 /usr/bin/sha1sum
>
> hadrib at cdl1:~> which sha1sum
>
> /usr/bin/sha1sum
>
> hadrib at cdl1:~> module list
>
> Currently Loaded Modulefiles:
>
> 1) modules/3.2.10.6 9) pmi/5.0.13 17) atp/2.1.1
>
> 2) eproxy/2.0.22-6.0.5.0_2.1__g1ebe45c.ari 10)
> dmapp/7.1.1-6.0.5.0_49.8__g1125556.ari 18)
> perftools-base/7.0.0
>
> 3) cce/8.6.5 11)
> gni-headers/5.0.12-6.0.5.0_2.15__g2ef1ebc.ari 19) PrgEnv-cray/6.0.4
>
> 4) craype-network-aries 12)
> xpmem/2.2.4-6.0.5.0_4.8__g35d5e73.ari 20) cray-mpich/7.7.0
>
> 5) craype/2.5.13 13)
> job/2.2.2-6.0.5.0_8.47__g3c644b5.ari 21) slurm/slurm
>
> 6) cray-libsci/17.12.1 14)
> dvs/2.7_2.2.52-6.0.5.2_17.6__g5170dea 22) craype-haswell
>
> 7) udreg/2.3.2-6.0.5.0_13.12__ga14955a.ari 15)
> alps/6.5.28-6.0.5.0_18.6__g13a91b6.ari 23) texlive/2017
>
> 8) ugni/6.0.14-6.0.5.0_16.9__g19583bb.ari 16)
> rca/2.2.16-6.0.5.0_15.34__g5e09e6d.ari 24) darshan/3.1.5
>
> hadrib at cdl1:~> module swap PrgEnv-cray/6.0.4 PrgEnv-intel
>
> hadrib at cdl1:~> ftn -o exe_i test_scalapack.f90
>
> /opt/cray/pe/libsci/17.12.1/INTEL/16.0/x86_64/lib/libsci_intel_mpi.a(blacs_exit_.o):
> In function `blacs_exit_':
>
> blacs_exit_.c:(.text+0xe9): undefined reference to
> `__wrap_MPI_Finalize'
>
> /opt/cray/pe/libsci/17.12.1/INTEL/16.0/x86_64/lib/libsci_intel_mpi.a(blacs_pinfo_.o):
> In function `blacs_pinfo_':
>
> blacs_pinfo_.c:(.text+0x9b): undefined reference to `__wrap_MPI_Init'
>
> /opt/cray/pe/libsci/17.12.1/INTEL/16.0/x86_64/lib/libsci_intel_mpi.a(blacs_pinfo_.oo):
> In function `Cblacs_pinfo':
>
> blacs_pinfo_.c:(.text+0x9b): undefined reference to `__wrap_MPI_Init'
>
> hadrib at cdl1:~>
>
> hadrib at cdl1:~>
>
> hadrib at cdl1:~> module swap PrgEnv-intel/6.0.4 PrgEnv-gnu
>
> PrgEnv-gnu PrgEnv-gnu/6.0.4
>
> hadrib at cdl1:~> module swap PrgEnv-intel/6.0.4 PrgEnv-gnu
>
> hadrib at cdl1:~> ftn -o exe_i test_scalapack.f90
>
> /opt/cray/pe/libsci/17.12.1/GNU/6.1/x86_64/lib/libsci_gnu_61_mpi.a(blacs_exit_.o):
> In function `blacs_exit_':
>
> blacs_exit_.c:(.text+0xdb): undefined reference to
> `__wrap_MPI_Finalize'
>
> /opt/cray/pe/libsci/17.12.1/GNU/6.1/x86_64/lib/libsci_gnu_61_mpi.a(blacs_pinfo_.o):
> In function `blacs_pinfo_':
>
> blacs_pinfo_.c:(.text+0xb3): undefined reference to `__wrap_MPI_Init'
>
> /opt/cray/pe/libsci/17.12.1/GNU/6.1/x86_64/lib/libsci_gnu_61_mpi.a(blacs_pinfo_.oo):
> In function `Cblacs_pinfo':
>
> blacs_pinfo_.c:(.text+0xb3): undefined reference to `__wrap_MPI_Init'
>
> =====
>
> implicit none
>
> integer :: n, nb ! problem size and block size
>
> integer :: myunit ! local output unit number
>
> integer :: myArows, myAcols ! size of local subset of global array
>
> integer :: i,j, igrid,jgrid, iproc,jproc, myi,myj, p
>
> real*8, dimension(:,:), allocatable :: myA,myB,myC
>
> integer :: numroc ! blacs routine
>
> integer :: me, procs, icontxt, prow, pcol, myrow, mycol ! blacs data
>
> integer :: info ! scalapack return value
>
> integer, dimension(9) :: ides_a, ides_b, ides_c ! scalapack
> array desc
>
> open(unit=1,file="ABCp.dat",status="old",form="formatted")
>
> read(1,*)prow
>
> read(1,*)pcol
>
> read(1,*)n
>
> read(1,*)nb
>
> close(1)
>
> if (((n/nb) < prow) .or. ((n/nb) < pcol)) then
>
> print *,"Problem size too small for processor set!"
>
> stop 100
>
> endif
>
> call blacs_pinfo (me,procs)
>
> call blacs_get (0, 0, icontxt)
>
> call blacs_gridinit(icontxt, 'R', prow, pcol)
>
> call blacs_gridinfo(icontxt, prow, pcol, myrow, mycol)
>
> myunit = 10+me
>
> write(myunit,*)"--------"
>
> write(myunit,*)"Output for processor ",me," to unit ",myunit
>
> write(myunit,*)"Proc ",me,": myrow, mycol in p-array is ", &
>
> myrow, mycol
>
> myArows = numroc(n, nb, myrow, 0, prow)
>
> myAcols = numroc(n, nb, mycol, 0, pcol)
>
> write(myunit,*)"Size of global array is ",n," x ",n
>
> write(myunit,*)"Size of block is ",nb," x ",nb
>
> write(myunit,*)"Size of local array is ",myArows," x ",myAcols
>
> allocate(myA(myArows,myAcols))
>
> allocate(myB(myArows,myAcols))
>
> allocate(myC(myArows,myAcols))
>
> do i=1,n
>
> call g2l(i,n,prow,nb,iproc,myi)
>
> if (myrow==iproc) then
>
> do j=1,n
>
> call g2l(j,n,pcol,nb,jproc,myj)
>
> if (mycol==jproc) then
>
> myA(myi,myj) = real(i+j)
>
> myB(myi,myj) = real(i-j)
>
> myC(myi,myj) = 0.d0
>
> endif
>
> enddo
>
> endif
>
> enddo
>
> ! Prepare array descriptors for ScaLAPACK
>
> ides_a(1) = 1 ! descriptor type
>
> ides_a(2) = icontxt ! blacs context
>
> ides_a(3) = n ! global number of rows
>
> ides_a(4) = n ! global number of columns
>
> ides_a(5) = nb ! row block size
>
> ides_a(6) = nb ! column block size
>
> ides_a(7) = 0 ! initial process row
>
> ides_a(8) = 0 ! initial process column
>
> ides_a(9) = myArows ! leading dimension of local array
>
> do i=1,9
>
> ides_b(i) = ides_a(i)
>
> ides_c(i) = ides_a(i)
>
> enddo
>
> ! Call ScaLAPACK library routine
>
> call pdgemm('T','T',n,n,n,1.0d0, myA,1,1,ides_a, &
>
> myB,1,1,ides_b,0.d0, &
>
> myC,1,1,ides_c )
>
> ! Print results
>
> call g2l(n,n,prow,nb,iproc,myi)
>
> call g2l(n,n,pcol,nb,jproc,myj)
>
> if ((myrow==iproc) .and. (mycol==jproc)) &
>
> write(*,*) 'c(',n,n,')=',myC(myi,myj)
>
> ! Deallocate the local arrays
>
> deallocate(myA, myB, myC)
>
> ! End blacs for processors that are used
>
> call blacs_gridexit(icontxt)
>
> call blacs_exit(0)
>
> end
>
> ! convert global index to local index in block-cyclic distribution
>
> subroutine g2l(i,n,np,nb,p,il)
>
> implicit none
>
> integer :: i ! global array index, input
>
> integer :: n ! global array dimension, input
>
> integer :: np ! processor array dimension, input
>
> integer :: nb ! block size, input
>
> integer :: p ! processor array index, output
>
> integer :: il ! local array index, output
>
> integer :: im1
>
> im1 = i-1
>
> p = mod((im1/nb),np)
>
> il = (im1/(np*nb))*nb + mod(im1,nb) + 1
>
> return
>
> end
>
> ! convert local index to global index in block-cyclic distribution
>
> subroutine l2g(il,p,n,np,nb,i)
>
> implicit none
>
> integer :: il ! local array index, input
>
> integer :: p ! processor array index, input
>
> integer :: n ! global array dimension, input
>
> integer :: np ! processor array dimension, input
>
> integer :: nb ! block size, input
>
> integer :: i ! global array index, output
>
> integer :: ilm1
>
> ilm1 = il-1
>
> i = (((ilm1/nb) * np) + p)*nb + mod(ilm1,nb) + 1
>
> return
>
> end
>
> -------
>
> Bilel Hadri, PhD
>
> Computational Scientist
>
> KAUST Supercomputing Lab
>
> Al Khawarizmi Bldg. (1) Office 126
>
> 4700 King Abdullah University of Science and Technology
>
> Thuwal 23955-6900
>
> Kingdom of Saudi Arabia
>
> Office Phone: +966 12 808 0654
>
> Cell Phone: + 966 544 700 893
>
> ------------------------------------------------------------------------
>
> This message and its contents including attachments are intended
> solely for the original recipient. If you are not the intended
> recipient or have received this message in error, please notify me
> immediately and delete this message from your computer system. Any
> unauthorized use or distribution is prohibited. Please consider
> the environment before printing this email.
>
>
>
> _______________________________________________
>
> Darshan-users mailing list
>
> Darshan-users at lists.mcs.anl.gov
> <mailto:Darshan-users at lists.mcs.anl.gov>
>
> https://lists.mcs.anl.gov/mailman/listinfo/darshan-users
>
>
>
> *Disclaimer*
>
> The Numerical Algorithms Group Ltd is a company registered in England
> and Wales with company number 1249803. The registered office is:
> Wilkinson House, Jordan Hill Road, Oxford OX2 8DR, United Kingdom.
>
> This e-mail has been scanned for all viruses and malware, and may have
> been automatically archived by Mimecast Ltd, an innovator in Software
> as a Service (SaaS) for business.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/darshan-users/attachments/20180226/20cbd2c3/attachment-0001.html>
More information about the Darshan-users
mailing list