[MPICH] tests are failing with different numbers

Rajeev Thakur thakur at mcs.anl.gov
Mon Sep 26 11:04:44 CDT 2005


If you use the MPI-IO functions for writing and fcntl file locks work
correctly on the NFS installation, you should get the right data in the
files.

Rajeev
 

> -----Original Message-----
> From: owner-mpich-discuss at mcs.anl.gov 
> [mailto:owner-mpich-discuss at mcs.anl.gov] On Behalf Of Jerry Mersel
> Sent: Monday, September 26, 2005 3:37 AM
> To: Rob Ross
> Cc: jerry.mersel at weizmann.ac.il; Ashley Pittman; 
> mpich-discuss at mcs.anl.gov
> Subject: Re: [MPICH] tests are failing with different numbers
> 
> The data written to the exported/mounted directory gets 
> written and does
> not interfere with what's going on  with other machines.
> 
>                             Regards,
>                              Jerry
> 
> e.x. Certain uw-imap folder formats are not NFS safe
> 
> 
> > What do you mean by that?
> >
> > Rob
> >
> > Jerry Mersel wrote:
> >> Another question...
> >>
> >>   Is MPICH (and,or MPICH2) NFS safe?
> >>
> >>                        Regards,
> >>                         Jerry
> >>
> >>
> >>
> >>>On Tue, 2005-09-20 at 16:54 +0300, Jerry Mersel wrote:
> >>>
> >>>>Hi:
> >>>>  I've installed MPICH 1.2.6 on a cluster which consists 
> of several
> >>>> dual
> >>>>opteron machines running redhat AS 4.0.
> >>>>   A user, while running an application using 4 
> processors has brought
> >>>>to
> >>>>   my attention that 2 runs with the same binary results 
> in 2 different
> >>>>   sets of results.
> >>>>
> >>>>    I then ran the tests that come with mpich (I know I 
> should have
> >>>> done
> >>>>it  before, we just won't tell anybody). And come up with 
> errors, Here
> >>>>are they
> >>>>     are:
> >>>>          Differences in issendtest.out
> >>>>          Differences in structf.out
> >>>>          *** Checking for differences from expected output ***
> >>>>          Differences in issendtest.out
> >>>>          Differences in structf.out
> >>>>           p0_3896:  p4_error: net_recv read:  probable 
> EOF on socket:
> >>>> 1
> >>>>           p0_10524:  p4_error: : 972
> >>>>          0 - MPI_ADDRESS : Address of location given to 
> MPI_ADDRESS
> >>>>does
> >>>>not    fit in Fortran integer
> >>>>
> >>>>   I've tried this with the gcc compiler and the pgi compiler with
> >>>>   none and many different options - the results are the same.
> >>>
> >>>>   I tried using MPICH from the pgi site, the user still 
> got different
> >>>>   results on different runs.
> >>>
> >>>I've seen something similar to this, firstly structf is testing
> >>>something that is mathematically impossible, i.e. storing 
> a (in your
> >>>case 64bit) pointer in a 32bit integer.  This sometimes 
> works (depending
> >>>on what the pointer is) but often doesn't, we have some 
> patches for this
> >>>(I believe pgi also ship them) but it's still not a 100% 
> cure.  Unless
> >>>you actually have an application that suffers from this 
> then you don't
> >>>need the patch.
> >>>
> >>>Secondly RedHat AS 4.0 has some odd features that 
> effectively mean your
> >>>supposed to get different results from running the same 
> program twice,
> >>>in particular it's got exec-shield-randomize enabled which 
> moves the
> >>>stack about between runs and it runs a cron job overnight which
> >>>randomises the load address of your installed shared 
> library's.  This
> >>>means that the same binary on two different nodes will 
> have a different
> >>>address for the stack and a different address for mmap()/shared
> >>>library's.
> >>>
> >>>Most applications however should hide this away from you 
> and MPI itself
> >>>is designed to be independent of this type of 
> configuration change, it
> >>>is fairly easy to introduce artificial dependencies 
> without meaning to
> >>>though.
> >>>
> >>>And of course there are the normal problems of floating 
> point accuracy,
> >>>some apps aren't actually supposed to get identical results between
> >>>runs, merely similar ones...
> >>>
> >>>Ashley,
> >>>
> >>>
> >>
> >>
> >
> >
> 
> 




More information about the mpich-discuss mailing list