newbie errors
Rob Ross
rross at mcs.anl.gov
Thu Apr 8 12:12:11 CDT 2004
Joseph,
Likewise, cc'ing this back to parallel-netcdf.
How many processes do you need in order for the code to exhibit this
behavior? Could you run as a single process and gdb it? Can you get a
core file and see where it crashed?
Thanks,
Rob
On Wed, 7 Apr 2004 jabencke at ncsu.edu wrote:
> Rob,
>
> It's old code but we constantly work on it. I'm working on the traceback
> but that could prove difficult.
>
> I'm trying to make these calls from within a subroutine so I'm not sure if
> I'm programming it incorrectly. Also, we are using PVFS on the cluster.
> I hope some of this helps.
>
> Joseph
>
> > Hi,
> >
> > Is this a program that has been run on other machines or used for a long
> > period of time, or is this a new code? Can you get a traceback of where
> > that segfault occurred?
> >
> > Thanks,
> >
> > Rob
> >
> > On Wed, 7 Apr 2004 jabencke at ncsu.edu wrote:
> >
> >> Rob,
> >> First thanks for your attention.
> >>
> >> I've done what you asked and I'm still getting errors but they are
> >> different. This code I've sent you is not the main program but just a
> >> subroutine. Now I'm seeing errors like:
> >>
> >> Killed by signal 2.^M
> >> Killed by signal 2.^M
> >> Killed by signal 2.^M
> >> Killed by signal 2.^M
> >> Killed by signal 2.^M
> >> Killed by signal 2.^M
> >> Killed by signal 2.^M
> >> /opt/mpich/ethernet/icc/bin/mpirun: line 1: 8391 Broken pipe
> >> /hom\e/jabencke/pssdw/vhone -p4pg /home/jabencke/pssdw/PI8171 -p4wd
> >> /home/jabencke/\pssdw
> >>
> >> AND
> >>
> >> p0_8391: p4_error: interrupt SIGSEGV: 11
> >>
> >>
> >> Joseph Benckert
> >>
> >> > The reason I think that you might need to is that your error
> >> > "Intercommunicator is not allowed" looks like the result of getting
> >> the
> >> > wrong value for MPI_COMM_WORLD.
> >> >
> >> > In general you should include the MPI headers in MPI programs. Can
> >> you
> >> > try it?
> >> >
> >> > Thanks,
> >> >
> >> > Rob
> >> >
> >> > On Tue, 6 Apr 2004 jabencke at ncsu.edu wrote:
> >> >
> >> >> Rob,
> >> >> Thanks for the quick response. I don't believe I need to include it.
> >> I
> >> >> don't think the compiler would recognize the MPI_COMM_WORLD,
> >> otherwise.
> >> >> Anyway, we are using a Linux cluster (Rocks), and mpich.
> >> >>
> >> >> Joseph Benckert
> >> >> Department of Physics
> >> >> North Carolina State University
> >> >> jabencke at unity.ncsu.edu
> >> >>
> >> >>
> >> >> > Hi,
> >> >> >
> >> >> > What machine and MPI are you using here? You should be including
> >> an
> >> >> MPI
> >> >> > header too; that might be the cause (or you might have just
> >> neglected
> >> >> to
> >> >> > include that).
> >> >> >
> >> >> > Thanks,
> >> >> >
> >> >> > Rob
> >> >> >
> >> >> > On Tue, 6 Apr 2004 jabencke at ncsu.edu wrote:
> >> >> >
> >> >> >> I'm not sure what's causing the errors I'm having, listed below.
> >> I'm
> >> >> >> trying to baby step here and just create the file to start things
> >> >> off.
> >> >> >> Below the errors is the code that causes the problem. It's
> >> repeated
> >> >> 8
> >> >> >> times because it's an 8 processor test job. Any help would be
> >> >> >> fantastic.
> >> >> >>
> >> >> >> Can not open/create file
> >> >> >>
> >> >> >> Can not open/create file
> >> >> >>
> >> >> >> Can not open/create file
> >> >> >>
> >> >> >> Can not open/create file
> >> >> >>
> >> >> >> Can not open/create file
> >> >> >>
> >> >> >> -1073749200: MPI_File_open error = Intercommunicator is not
> >> allowed
> >> >> >> Can not open/create file
> >> >> >>
> >> >> >> -1073746896: MPI_File_open error = Intercommunicator is not
> >> allowed
> >> >> >> Can not open/create file
> >> >> >>
> >> >> >> -1073751120: MPI_File_open error = Intercommunicator is not
> >> allowed
> >> >> >> Can not open/create file
> >> >> >>
> >> >> >> -1073747536: MPI_File_open error = Intercommunicator is not
> >> allowed
> >> >> >> -1073749456: MPI_File_open error = Intercommunicator is not
> >> allowed
> >> >> >> -1073746512: MPI_File_open error = Intercommunicator is not
> >> allowed
> >> >> >> -1073750224: MPI_File_open error = Intercommunicator is not
> >> allowed
> >> >> >> -1073747248: MPI_File_open error = Intercommunicator is not
> >> allowed
> >> >> >>
> >> >> >>
> >> >> >> Code here:
> >> >> >>
> >> >> >> subroutine prin(prefix)
> >> >> >>
> >> >> >> ! Outputs ascii array if ndim = 1, else if ndim > 1 then
> >> >> >> ! write out hdf5 data file containing all variables (plus time).
> >> >> >>
> >> >> >> include 'pnetcdf.inc'
> >> >> >> include 'global.h'
> >> >> >> include 'sweep.h'
> >> >> >> include 'zone.h'
> >> >> >>
> >> >> >> ! integer(HID_T) :: hdf_file !file id for hdf file
> >> >> >> integer :: hdf_error !error var for hdf5 file
> >> >> >>
> >> >> >> character(LEN=1) :: char
> >> >> >> character(LEN=1) :: coord
> >> >> >> character(LEN=4) :: tmp1, tmp2
> >> >> >> character(LEN=5) :: prefix
> >> >> >> character(LEN=15) :: filename
> >> >> >>
> >> >> >> ! Added (Fortran 90 style) for hdf5 stuff
> >> >> >>
> >> >> >> ! INTEGER(HID_T) :: dsp_id ! Dataspace ID
> >> >> >> ! INTEGER(HID_T) :: dset_id !Dataset ID
> >> >> >> INTEGER, DIMENSION(3) :: dims
> >> >> >> INTEGER, DIMENSION(3) :: dimids
> >> >> >> INTEGER :: status, ncid
> >> >> >> INTEGER :: xDimID, yDimID, zDimID
> >> >> >> INTEGER :: yMaxDimID, tsDimID
> >> >> >> INTEGER :: density_varID, pressure_varID
> >> >> >> INTEGER :: XVelocity_varID, YVelocity_varID, ZVelocity_varID
> >> >> >> INTEGER :: XScale_varID, YScale_varID, ZScale_varID,
> >> time_varID
> >> >> >>
> >> >> >> !------------------------------------------------------------------------------
> >> >> >>
> >> >> >> ! Create filename from integer nfile (in global.h) and prefix such
> >> >> that
> >> >> >> filename
> >> >> >> ! looks like prefx.1000 where 1000 is the value of nfile
> >> >> >>
> >> >> >> write(tmp1,910) nfile
> >> >> >> write(tmp2,910) mype
> >> >> >> 910 format(i4)
> >> >> >> do i = 1, 4
> >> >> >> if ((tmp1(i:i)) .eq. ' ') tmp1(i:i) = '0'
> >> >> >> if ((tmp2(i:i)) .eq. ' ') tmp2(i:i) = '0'
> >> >> >> enddo
> >> >> >> filename = prefix(1:5) // '_' // tmp1(1:4) // '.' //
> >> tmp2(1:4)
> >> >> >> nfile = nfile + 1
> >> >> >>
> >> >> >> if (ndim .eq. 1) then
> >> >> >>
> >> >> >> ! Keep 1D output simple, just write out in ascii...
> >> >> >> open(unit=3,file=filename,form='formatted')
> >> >> >> do i = 1, imax
> >> >> >> write(3, 1003) zxa(i), zro(i,1,1),zpr(i,1,1), zux(i,1,1)
> >> >> >> enddo
> >> >> >> close(3)
> >> >> >>
> >> >> >> else
> >> >> >>
> >> >> >>
> >> >> >> ! Initialize Dimensions
> >> >> >> dims(1) = imax
> >> >> >> dims(2) = js
> >> >> >> if(ndim.eq.3) dims(3) = kmax
> >> >> >>
> >> >> >> status = nfmpi_create(MPI_COMM_WORLD, filename,
> >> >> >> & MPI_INFO_NULL, nf90_Clobber, ncid)
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >> if (status /= nf90_NoErr ) print *, nfmpi_strerror(status)
> >> >> >>
> >> >> >> ! always a necessary statment to flush output
> >> >> >> status = nfmpi_close(ncid)
> >> >> >>
> >> >> >> endif
> >> >> >>
> >> >> >> write(8,6000) filename, time, ncycle
> >> >> >>
> >> >> >> 6000 format('Wrote ',a10,' to disk at time =',1pe12.5,' (ncycle
> >> =',
> >> >> >> & i6,')')
> >> >> >> 1003 format(' ',4e13.5)
> >> >> >>
> >> >> >> return
> >> >> >> end
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >
> >> >> >
> >> >>
> >> >>
> >> >
> >> >
> >>
> >>
> >
> >
>
>
More information about the parallel-netcdf
mailing list