[MPICH] Parallel I/O problems on 64-bit machine ( please help:-( )
Peter Diamessis
pjd38 at cornell.edu
Tue May 23 17:52:27 CDT 2006
Thanks a-many YoSung,
I'll contact the Absoft people to see if there is a similar issue
with their F90-95 compiler. I have to be on travel tomorrow
but I'll get back to this on Thursday.
The pointer is much appreciated,
Peter
----- Original Message -----
From: "Yusong Wang" <ywang25 at aps.anl.gov>
To: "Peter Diamessis" <pjd38 at cornell.edu>
Cc: <mpich-discuss at mcs.anl.gov>
Sent: Tuesday, May 23, 2006 5:53 PM
Subject: Re: [MPICH] Parallel I/O problems on 64-bit machine ( please
help:-( )
> You might have read this from the manual. Just in case if it could help.
>
> D.4 Q: When I use the g95 Fortran compiler on a 64-bit platform, some of
> the tests fail
>
> A: The g95 compiler incorrectly defines the default Fortran integer as a
> 64- bit integer while defining Fortran reals as 32-bit values (the
> Fortran standard requires that INTEGER and REAL be the same size). This
> was apparently done to allow a Fortran INTEGER to hold the value of a
> pointer, rather than requiring the programmer to select an INTEGER of a
> suitable KIND. To force the g95 compiler to correctly implement the
> Fortran standard, use the -i4 flag. For example, set the environment
> variable F90FLAGS before configuring MPICH2: setenv F90FLAGS "-i4" G95
> users should note that there (at this writing) are two distributions of
> g95 for 64-bit Linux platforms. One uses 32-bit integers and reals (and
> conforms to the Fortran standard) and one uses 32-bit integers and 64-
> bit reals. We recommend using the one that conforms to the standard
> (note that the standard specifies the ratio of sizes, not the absolute
> sizes, so a Fortran 95 compiler that used 64 bits for both INTEGER and
> REAL would also conform to the Fortran standard. However, such a
> compiler would need to use 128 bits for DOUBLE PRECISION quantities).
>
> Yusong
>
> On Tue, 2006-05-23 at 14:48 -0400, Peter Diamessis wrote:
>> Hi again,
>>
>> I'm still obsessing as to why MPI I/O fails on my 64-bit machine.
>> I've decided to set MPICH2 aside and work with MPICH v1.2.6 which
>> is the one version that worked reliably for me. This is the latest I
>> observed.
>>
>> I guessed that some integer argument must be passed wrong when using
>> a 64-bit machine. I recompiled the code (I use Absoft Pro Fortran 10.0)
>> and forced the default size of integers to be 8 bytes. Lo behold my I/O
>> routine crashes at an earlier point with the following interesting
>> message:
>>
>> 0 - MPI_TYPE_CREATE_SUBARRAY: Invalid value in array_of_sizes[1]=0 .
>>
>> Now, all the elements of the array os fizes should be non-zero integers,
>> e.g. 64, 64, 175 . Is some information on integers being screwed up in
>> the
>> 64-bit
>> layout ?
>>
>> Note that after a few secs. of hanging I also get the followign:
>>
>> p0_25936: (0.089844) net_send: could not write to fd=4, errno = 32
>>
>> This is the exact same error I get when running ' make testing ' after
>> having installed MPICH, i.e.:
>>
>> *** Testing Type_struct from Fortran ***
>> Differences in structf.out
>> 2,7c2
>> < 0 - MPI_ADDRESS : Address of location given to MPI_ADDRESS does not fit
>> in
>> Fortran integer
>> < [0] Aborting program !
>> < [0] Aborting program!
>> < p0_25936: p4_error: : 972
>> < Killed by signal 2.
>> < p0_25936: (0.089844) net_send: could not write to fd=4, errno = 32
>>
>> Again, any help would be hugely appreciated. I'll buy you guys beers !
>>
>> Many thanks,
>>
>> Peter
>>
>>
>> ----- Original Message -----
>> From: "Peter Diamessis" <pjd38 at cornell.edu>
>> To: <mpich-discuss at mcs.anl.gov>
>> Sent: Monday, May 22, 2006 2:33 PM
>> Subject: [MPICH] Parallel I/O problems on 64-bit machine ( please help
>> :-( )
>>
>>
>> > Hello folks,
>> >
>> > I'm writing this note to ask some help with running MPI on
>> > a dual proc. 64-bit Linux box I just acquired. I've written a similar
>> > not to the mpi-bugs address but would appreciate any additional
>> > help from anyone else in the community.
>> >
>> > I'm using MPICH v1.2.7p1,
>> > which, when tested, seems to work wonderfully with everything except
>> > for
>> > some specific parallel I/O calls.
>> >
>> > Specifically, whenever there is a call to MPI_FILE_WRITE_ALL
>> > or MPI_FILE_READ_ALL an SIGSEGV error pops up. Note that
>> > these I/O dumps are part of a greater CFD code which
>> > has worked fine on either a 32-bit dual proc. Linux workstation
>> > or the USC-HPCC Linux cluster (where I was a postdoc).
>> >
>> > In my message to mpi-bugs, I did attach a variety of files that
>> > could provide additional insight. In this case I'm attaching only
>> > the Fortran source code I can gladly provide more material
>> > anyone who may be interested.The troublesome Fortran call is:
>> >
>> > call MPI_FILE_WRITE_ALL(fh, tempout, local_array_size,
>> >> MPI_REAL,
>> >> MPI_STATUS_IGNORE)
>> >
>> > Upon call this, the program crashes with a SIGSEGV 11 error. Evidently,
>> > some memory is accessed out of core ?
>> >
>> > Tempout is a single precision (Real with kind=4) 3-D array, which has a
>> > total local
>> > number of elements on each processor equal to local_array_size.
>> > If I change MPI_STATUS_ARRAY to status_array,ierr (where
>> > status_array si appropriately dimensioned) I find that upon error,
>> > printing out the elements of status_array yields these huge values.
>> > This error always is always localized on processor (N+1)/2 (proc.
>> > numbering
>> > goes from 0 to N-1).
>> >
>> > I installed MPICH2 only to observe the same results.
>> > Calls to MPI_FILE_READ_ALL will also produce identical effects.
>> > I'll reiterate that we've never had problems with this code on 32-bit
>> > machines.
>> >
>> > Note that uname -a returns:
>> >
>> > Linux pacific.cee.cornell.edu 2.6.9-5.ELsmp #1 SMP Wed Jan 5 19:29:47
>> > EST
>> > 2005 x86_64 x86_64 x86_64 GNU/Linux
>> >
>> > Am I running into problems because I've got a 64-bit configured Linux
>> > on a
>> > 64-bit
>> > machine.
>> >
>> > Any help would HUGELY appreciated. The ability to use MPI2 parallel I/O
>> > on
>> > our workstation would greatly help us crunch through some existing
>> > large
>> > datafiles
>> > generated on 32-bit machines.
>> >
>> > Cheers,
>> >
>> > Peter
>> >
>> > -------------------------------------------------------------
>> > Peter Diamessis
>> > Assistant Professor
>> > Environmental Fluid Mechanics & Hydrology
>> > School of Civil and Environmental Engineering
>> > Cornell University
>> > Ithaca, NY 14853
>> > Phone: (607)-255-1719 --- Fax: (607)-255-9004
>> > pjd38 at cornell.edu
>> > http://www.cee.cornell.edu/fbxk/fcbo.cfm?pid=494
>> >
>> >
>>
>>
>
More information about the mpich-discuss
mailing list