[MPICH] Parallel I/O problems on 64-bit machine ( please help :-( )

Yusong Wang ywang25 at aps.anl.gov
Tue May 23 16:53:58 CDT 2006


You might have read this from the manual. Just in case if it could help.

D.4 Q: When I use the g95 Fortran compiler on a 64-bit platform, some of
the tests fail 

A: The g95 compiler incorrectly defines the default Fortran integer as a
64- bit integer while defining Fortran reals as 32-bit values (the
Fortran standard requires that INTEGER and REAL be the same size). This
was apparently done to allow a Fortran INTEGER to hold the value of a
pointer, rather than requiring the programmer to select an INTEGER of a
suitable KIND. To force the g95 compiler to correctly implement the
Fortran standard, use the -i4 flag. For example, set the environment
variable F90FLAGS before configuring MPICH2: setenv F90FLAGS "-i4" G95
users should note that there (at this writing) are two distributions of
g95 for 64-bit Linux platforms. One uses 32-bit integers and reals (and
conforms to the Fortran standard) and one uses 32-bit integers and 64-
bit reals. We recommend using the one that conforms to the standard
(note that the standard specifies the ratio of sizes, not the absolute
sizes, so a Fortran 95 compiler that used 64 bits for both INTEGER and
REAL would also conform to the Fortran standard. However, such a
compiler would need to use 128 bits for DOUBLE PRECISION quantities).

Yusong

On Tue, 2006-05-23 at 14:48 -0400, Peter Diamessis wrote:
> Hi again,
> 
> I'm still obsessing as to why MPI I/O fails on my 64-bit machine.
> I've decided to set MPICH2 aside and work with MPICH v1.2.6 which
> is the one version that worked reliably for me. This is the latest I 
> observed.
> 
> I guessed that some integer argument must be passed wrong when using
> a 64-bit machine. I recompiled the code (I use Absoft Pro Fortran 10.0)
> and forced the default size of  integers to be 8 bytes. Lo behold my I/O
> routine crashes at an earlier point with the following interesting message:
> 
> 0 - MPI_TYPE_CREATE_SUBARRAY: Invalid value in array_of_sizes[1]=0 .
> 
> Now, all the elements of the array os fizes should be non-zero integers,
> e.g. 64, 64, 175 . Is some information on integers being screwed up in the 
> 64-bit
> layout ?
> 
> Note that after a few secs. of hanging I also get the followign:
> 
> p0_25936: (0.089844) net_send: could not write to fd=4, errno = 32
> 
> This is the exact same error I get when running ' make testing ' after
> having installed MPICH, i.e.:
> 
> *** Testing Type_struct from Fortran ***
> Differences in structf.out
> 2,7c2
> < 0 - MPI_ADDRESS : Address of location given to MPI_ADDRESS does not fit in 
> Fortran integer
> < [0]  Aborting program !
> < [0] Aborting program!
> < p0_25936:  p4_error: : 972
> < Killed by signal 2.
> < p0_25936: (0.089844) net_send: could not write to fd=4, errno = 32
> 
> Again, any help would be hugely appreciated. I'll buy you guys beers !
> 
> Many thanks,
> 
> Peter
> 
> 
> ----- Original Message ----- 
> From: "Peter Diamessis" <pjd38 at cornell.edu>
> To: <mpich-discuss at mcs.anl.gov>
> Sent: Monday, May 22, 2006 2:33 PM
> Subject: [MPICH] Parallel I/O problems on 64-bit machine ( please help :-( )
> 
> 
> > Hello folks,
> >
> > I'm writing this note to ask some help with running MPI on
> > a dual proc. 64-bit Linux box I just acquired. I've written a similar
> > not to the mpi-bugs address but would appreciate any additional
> > help from anyone else in the community.
> >
> > I'm using MPICH v1.2.7p1,
> > which, when tested,  seems to work wonderfully with everything except for
> > some specific parallel I/O calls.
> >
> > Specifically, whenever there is a call to MPI_FILE_WRITE_ALL
> > or MPI_FILE_READ_ALL an SIGSEGV error pops up. Note that
> > these I/O dumps are part of a greater CFD code which
> > has worked fine on either a 32-bit dual proc. Linux workstation
> > or the USC-HPCC Linux cluster (where I was a postdoc).
> >
> > In  my message to mpi-bugs, I did attach a variety of files that
> > could provide additional insight. In this case I'm attaching only
> > the Fortran source code I can gladly provide more material
> > anyone who may be interested.The troublesome Fortran call is:
> >
> >   call MPI_FILE_WRITE_ALL(fh, tempout, local_array_size,
> >> MPI_REAL,
> >> MPI_STATUS_IGNORE)
> >
> > Upon call this, the program crashes with a SIGSEGV 11 error. Evidently,
> > some memory is accessed out of core ?
> >
> > Tempout is a single precision (Real with kind=4) 3-D array, which has a
> > total local
> > number of elements on each processor equal to local_array_size.
> > If I change MPI_STATUS_ARRAY to status_array,ierr (where
> > status_array si appropriately dimensioned) I find that upon error,
> > printing out the elements of status_array yields these huge values.
> > This error always is always localized on processor (N+1)/2 (proc. 
> > numbering
> > goes from 0 to N-1).
> >
> > I installed MPICH2 only to observe the same results.
> > Calls to MPI_FILE_READ_ALL will also produce identical effects.
> > I'll reiterate that we've never had problems with this code on 32-bit
> > machines.
> >
> > Note that uname -a returns:
> >
> > Linux pacific.cee.cornell.edu 2.6.9-5.ELsmp #1 SMP Wed Jan 5 19:29:47 EST
> > 2005 x86_64 x86_64 x86_64 GNU/Linux
> >
> > Am I running into problems because I've got a 64-bit configured Linux on a
> > 64-bit
> > machine.
> >
> > Any help would HUGELY appreciated. The ability to use MPI2 parallel I/O on
> > our workstation would greatly help us crunch through some existing large
> > datafiles
> > generated on 32-bit machines.
> >
> > Cheers,
> >
> > Peter
> >
> > -------------------------------------------------------------
> > Peter Diamessis
> > Assistant Professor
> > Environmental Fluid Mechanics & Hydrology
> > School of Civil and Environmental Engineering
> > Cornell University
> > Ithaca, NY 14853
> > Phone: (607)-255-1719 --- Fax: (607)-255-9004
> > pjd38 at cornell.edu
> > http://www.cee.cornell.edu/fbxk/fcbo.cfm?pid=494
> >
> > 
> 
> 




More information about the mpich-discuss mailing list