[MPICH] Parallel I/O problems on 64-bit machine ( please help :-( )

Peter Diamessis pjd38 at cornell.edu
Tue May 23 13:48:27 CDT 2006


Hi again,

I'm still obsessing as to why MPI I/O fails on my 64-bit machine.
I've decided to set MPICH2 aside and work with MPICH v1.2.6 which
is the one version that worked reliably for me. This is the latest I 
observed.

I guessed that some integer argument must be passed wrong when using
a 64-bit machine. I recompiled the code (I use Absoft Pro Fortran 10.0)
and forced the default size of  integers to be 8 bytes. Lo behold my I/O
routine crashes at an earlier point with the following interesting message:

0 - MPI_TYPE_CREATE_SUBARRAY: Invalid value in array_of_sizes[1]=0 .

Now, all the elements of the array os fizes should be non-zero integers,
e.g. 64, 64, 175 . Is some information on integers being screwed up in the 
64-bit
layout ?

Note that after a few secs. of hanging I also get the followign:

p0_25936: (0.089844) net_send: could not write to fd=4, errno = 32

This is the exact same error I get when running ' make testing ' after
having installed MPICH, i.e.:

*** Testing Type_struct from Fortran ***
Differences in structf.out
2,7c2
< 0 - MPI_ADDRESS : Address of location given to MPI_ADDRESS does not fit in 
Fortran integer
< [0]  Aborting program !
< [0] Aborting program!
< p0_25936:  p4_error: : 972
< Killed by signal 2.
< p0_25936: (0.089844) net_send: could not write to fd=4, errno = 32

Again, any help would be hugely appreciated. I'll buy you guys beers !

Many thanks,

Peter


----- Original Message ----- 
From: "Peter Diamessis" <pjd38 at cornell.edu>
To: <mpich-discuss at mcs.anl.gov>
Sent: Monday, May 22, 2006 2:33 PM
Subject: [MPICH] Parallel I/O problems on 64-bit machine ( please help :-( )


> Hello folks,
>
> I'm writing this note to ask some help with running MPI on
> a dual proc. 64-bit Linux box I just acquired. I've written a similar
> not to the mpi-bugs address but would appreciate any additional
> help from anyone else in the community.
>
> I'm using MPICH v1.2.7p1,
> which, when tested,  seems to work wonderfully with everything except for
> some specific parallel I/O calls.
>
> Specifically, whenever there is a call to MPI_FILE_WRITE_ALL
> or MPI_FILE_READ_ALL an SIGSEGV error pops up. Note that
> these I/O dumps are part of a greater CFD code which
> has worked fine on either a 32-bit dual proc. Linux workstation
> or the USC-HPCC Linux cluster (where I was a postdoc).
>
> In  my message to mpi-bugs, I did attach a variety of files that
> could provide additional insight. In this case I'm attaching only
> the Fortran source code I can gladly provide more material
> anyone who may be interested.The troublesome Fortran call is:
>
>   call MPI_FILE_WRITE_ALL(fh, tempout, local_array_size,
>> MPI_REAL,
>> MPI_STATUS_IGNORE)
>
> Upon call this, the program crashes with a SIGSEGV 11 error. Evidently,
> some memory is accessed out of core ?
>
> Tempout is a single precision (Real with kind=4) 3-D array, which has a
> total local
> number of elements on each processor equal to local_array_size.
> If I change MPI_STATUS_ARRAY to status_array,ierr (where
> status_array si appropriately dimensioned) I find that upon error,
> printing out the elements of status_array yields these huge values.
> This error always is always localized on processor (N+1)/2 (proc. 
> numbering
> goes from 0 to N-1).
>
> I installed MPICH2 only to observe the same results.
> Calls to MPI_FILE_READ_ALL will also produce identical effects.
> I'll reiterate that we've never had problems with this code on 32-bit
> machines.
>
> Note that uname -a returns:
>
> Linux pacific.cee.cornell.edu 2.6.9-5.ELsmp #1 SMP Wed Jan 5 19:29:47 EST
> 2005 x86_64 x86_64 x86_64 GNU/Linux
>
> Am I running into problems because I've got a 64-bit configured Linux on a
> 64-bit
> machine.
>
> Any help would HUGELY appreciated. The ability to use MPI2 parallel I/O on
> our workstation would greatly help us crunch through some existing large
> datafiles
> generated on 32-bit machines.
>
> Cheers,
>
> Peter
>
> -------------------------------------------------------------
> Peter Diamessis
> Assistant Professor
> Environmental Fluid Mechanics & Hydrology
> School of Civil and Environmental Engineering
> Cornell University
> Ithaca, NY 14853
> Phone: (607)-255-1719 --- Fax: (607)-255-9004
> pjd38 at cornell.edu
> http://www.cee.cornell.edu/fbxk/fcbo.cfm?pid=494
>
> 





More information about the mpich-discuss mailing list