newbie errors

Rob Ross rross at mcs.anl.gov
Thu Apr 8 12:12:11 CDT 2004


Joseph,

Likewise, cc'ing this back to parallel-netcdf.

How many processes do you need in order for the code to exhibit this 
behavior?  Could you run as a single process and gdb it?  Can you get a 
core file and see where it crashed?

Thanks,

Rob

On Wed, 7 Apr 2004 jabencke at ncsu.edu wrote:

> Rob,
> 
> It's old code but we constantly work on it.  I'm working on the traceback
> but that could prove difficult.
> 
> I'm trying to make these calls from within a subroutine so I'm not sure if
> I'm programming it incorrectly.  Also, we are using PVFS on the cluster. 
> I hope some of this helps.
> 
> Joseph
> 
> > Hi,
> >
> > Is this a program that has been run on other machines or used for a long
> > period of time, or is this a new code?  Can you get a traceback of where
> > that segfault occurred?
> >
> > Thanks,
> >
> > Rob
> >
> > On Wed, 7 Apr 2004 jabencke at ncsu.edu wrote:
> >
> >> Rob,
> >> First thanks for your attention.
> >>
> >> I've done what you asked and I'm still getting errors but they are
> >> different.  This code I've sent you is not the main program but just a
> >> subroutine.  Now I'm seeing errors like:
> >>
> >> Killed by signal 2.^M
> >> Killed by signal 2.^M
> >> Killed by signal 2.^M
> >> Killed by signal 2.^M
> >> Killed by signal 2.^M
> >> Killed by signal 2.^M
> >> Killed by signal 2.^M
> >> /opt/mpich/ethernet/icc/bin/mpirun: line 1:  8391 Broken pipe
> >> /hom\e/jabencke/pssdw/vhone -p4pg /home/jabencke/pssdw/PI8171 -p4wd
> >> /home/jabencke/\pssdw
> >>
> >> AND
> >>
> >> p0_8391:  p4_error: interrupt SIGSEGV: 11
> >>
> >>
> >> Joseph Benckert
> >>
> >> > The reason I think that you might need to is that your error
> >> > "Intercommunicator is not allowed" looks like the result of getting
> >> the
> >> > wrong value for MPI_COMM_WORLD.
> >> >
> >> > In general you should include the MPI headers in MPI programs.  Can
> >> you
> >> > try it?
> >> >
> >> > Thanks,
> >> >
> >> > Rob
> >> >
> >> > On Tue, 6 Apr 2004 jabencke at ncsu.edu wrote:
> >> >
> >> >> Rob,
> >> >> Thanks for the quick response.  I don't believe I need to include it.
> >>  I
> >> >> don't think the compiler would recognize the MPI_COMM_WORLD,
> >> otherwise.
> >> >> Anyway, we are using a Linux cluster (Rocks), and mpich.
> >> >>
> >> >> Joseph Benckert
> >> >> Department of Physics
> >> >> North Carolina State University
> >> >> jabencke at unity.ncsu.edu
> >> >>
> >> >>
> >> >> > Hi,
> >> >> >
> >> >> > What machine and MPI are you using here?  You should be including
> >> an
> >> >> MPI
> >> >> > header too; that might be the cause (or you might have just
> >> neglected
> >> >> to
> >> >> > include that).
> >> >> >
> >> >> > Thanks,
> >> >> >
> >> >> > Rob
> >> >> >
> >> >> > On Tue, 6 Apr 2004 jabencke at ncsu.edu wrote:
> >> >> >
> >> >> >> I'm not sure what's causing the errors I'm having, listed below.
> >> I'm
> >> >> >> trying to baby step here and just create the file to start things
> >> >> off.
> >> >> >> Below the errors is the code that causes the problem.  It's
> >> repeated
> >> >> 8
> >> >> >> times because it's an 8 processor test job.  Any help would be
> >> >> >> fantastic.
> >> >> >>
> >> >> >>  Can not open/create file
> >> >> >>
> >> >> >>  Can not open/create file
> >> >> >>
> >> >> >>  Can not open/create file
> >> >> >>
> >> >> >>  Can not open/create file
> >> >> >>
> >> >> >>  Can not open/create file
> >> >> >>
> >> >> >> -1073749200: MPI_File_open error = Intercommunicator is not
> >> allowed
> >> >> >>  Can not open/create file
> >> >> >>
> >> >> >> -1073746896: MPI_File_open error = Intercommunicator is not
> >> allowed
> >> >> >>  Can not open/create file
> >> >> >>
> >> >> >> -1073751120: MPI_File_open error = Intercommunicator is not
> >> allowed
> >> >> >>  Can not open/create file
> >> >> >>
> >> >> >> -1073747536: MPI_File_open error = Intercommunicator is not
> >> allowed
> >> >> >> -1073749456: MPI_File_open error = Intercommunicator is not
> >> allowed
> >> >> >> -1073746512: MPI_File_open error = Intercommunicator is not
> >> allowed
> >> >> >> -1073750224: MPI_File_open error = Intercommunicator is not
> >> allowed
> >> >> >> -1073747248: MPI_File_open error = Intercommunicator is not
> >> allowed
> >> >> >>
> >> >> >>
> >> >> >> Code here:
> >> >> >>
> >> >> >>       subroutine prin(prefix)
> >> >> >>
> >> >> >> ! Outputs ascii array if ndim = 1, else if ndim > 1 then
> >> >> >> ! write out hdf5 data file containing all variables (plus time).
> >> >> >>
> >> >> >>       include 'pnetcdf.inc'
> >> >> >>       include 'global.h'
> >> >> >>       include 'sweep.h'
> >> >> >>       include 'zone.h'
> >> >> >>
> >> >> >>       ! integer(HID_T) :: hdf_file !file id for hdf file
> >> >> >>       integer :: hdf_error !error var for hdf5 file
> >> >> >>
> >> >> >>       character(LEN=1) :: char
> >> >> >>       character(LEN=1) :: coord
> >> >> >>       character(LEN=4) :: tmp1, tmp2
> >> >> >>       character(LEN=5) :: prefix
> >> >> >>       character(LEN=15) :: filename
> >> >> >>
> >> >> >> !     Added (Fortran 90 style) for hdf5 stuff
> >> >> >>
> >> >> >>       ! INTEGER(HID_T) :: dsp_id  ! Dataspace ID
> >> >> >>       ! INTEGER(HID_T) :: dset_id !Dataset ID
> >> >> >>       INTEGER, DIMENSION(3) :: dims
> >> >> >>       INTEGER, DIMENSION(3) :: dimids
> >> >> >>       INTEGER :: status, ncid
> >> >> >>       INTEGER :: xDimID, yDimID, zDimID
> >> >> >>       INTEGER :: yMaxDimID, tsDimID
> >> >> >>       INTEGER :: density_varID, pressure_varID
> >> >> >>       INTEGER :: XVelocity_varID, YVelocity_varID, ZVelocity_varID
> >> >> >>       INTEGER :: XScale_varID, YScale_varID, ZScale_varID,
> >> time_varID
> >> >> >>
> >> >> >> !------------------------------------------------------------------------------
> >> >> >>
> >> >> >> ! Create filename from integer nfile (in global.h) and prefix such
> >> >> that
> >> >> >> filename
> >> >> >> ! looks like prefx.1000 where 1000 is the value of nfile
> >> >> >>
> >> >> >>       write(tmp1,910) nfile
> >> >> >>       write(tmp2,910) mype
> >> >> >>  910  format(i4)
> >> >> >>       do i = 1, 4
> >> >> >>          if ((tmp1(i:i)) .eq. ' ') tmp1(i:i) = '0'
> >> >> >>          if ((tmp2(i:i)) .eq. ' ') tmp2(i:i) = '0'
> >> >> >>       enddo
> >> >> >>       filename = prefix(1:5) // '_' // tmp1(1:4) // '.' //
> >> tmp2(1:4)
> >> >> >>       nfile = nfile + 1
> >> >> >>
> >> >> >>       if (ndim .eq. 1) then
> >> >> >>
> >> >> >> ! Keep 1D output simple, just write out in ascii...
> >> >> >>         open(unit=3,file=filename,form='formatted')
> >> >> >>         do i = 1, imax
> >> >> >>           write(3, 1003) zxa(i), zro(i,1,1),zpr(i,1,1), zux(i,1,1)
> >> >> >>         enddo
> >> >> >>         close(3)
> >> >> >>
> >> >> >>       else
> >> >> >>
> >> >> >>
> >> >> >> !     Initialize Dimensions
> >> >> >>       dims(1) = imax
> >> >> >>       dims(2) = js
> >> >> >>       if(ndim.eq.3) dims(3) = kmax
> >> >> >>
> >> >> >>       status = nfmpi_create(MPI_COMM_WORLD, filename,
> >> >> >>      &                      MPI_INFO_NULL, nf90_Clobber, ncid)
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >>       if (status /= nf90_NoErr ) print *, nfmpi_strerror(status)
> >> >> >>
> >> >> >>       ! always a necessary statment to flush output
> >> >> >>       status = nfmpi_close(ncid)
> >> >> >>
> >> >> >>       endif
> >> >> >>
> >> >> >>       write(8,6000) filename, time, ncycle
> >> >> >>
> >> >> >>  6000 format('Wrote ',a10,' to disk at time =',1pe12.5,' (ncycle
> >> =',
> >> >> >>      &        i6,')')
> >> >> >>  1003 format(' ',4e13.5)
> >> >> >>
> >> >> >>       return
> >> >> >>       end
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >
> >> >> >
> >> >>
> >> >>
> >> >
> >> >
> >>
> >>
> >
> >
> 
> 





More information about the parallel-netcdf mailing list